SGML/XML document editing and publication on GNU/Linux systems SGML/XML editing mini-tutorial document editing and publication on GNU/Linux systems Martin Wheeler S. StarTEXT document engineering
Glastonbury, England - UK mwheeler@startext.co.uk
v0.3 2005-01-22 msw 2004, 2005 Martin S. Wheeler These mini-tutorial notes are intended to give trainee document editors a bare-bones overview of the facilities open to them when they create their own personal SGML editing workstation, using the GNU/Linux operating system. SGML XML DTD editing publishing Linux
SGML Editing Editing Technique [NOTE: this document assumes that the reader has a rudimentary grasp of what is meant by SGML markup; the types of documents it applies to; and why it is being used.] Editing, then publishing documents using SGML/XML is a three-stage process. The first stage - text creation - is itself a two-part activity. First, the basic text has to be created and written down. This can be done any way the author wishes -- using whatever tool(s) come to hand, from cat to OpenOffice.org. The only proviso is that the text be composed in ASCII plain text. Second, this text has to have markup applied to it ('tagging'). The rules governing how markup is applied are provided in standard SGML DTDs, thus providing the opportunity to employ DTD-aware text processing and validation methods. Again, this is done in ASCII format. The two activities may be combined into one simultaneous application of both. This will depend on the author's degree of knowledge of markup technique and familiarity with the relevant DTD and its allowable entities; plus the availability of suitable DTD-aware text processors. SGML-savvy authors may even wish to write their own conformant DTDs to specific in-house requirements. The final result is an ASCII text file, containing marked-up ('tagged') text. It is conventional to use the filename extension to indicate the type of markup employed; e.g .sgml, .xml. The second stage - text transformation - is also a two-stage activity. First the marked-up text must be passed through a processor to produce the desired final form of processed text (pdf, html, xhtml, dvi, ps, tex, etc.). Simultaneously, presentational formatting is applied to the document via the application of a pre-defined style-sheet, to determine the final visual appearance of the document within the chosen document type. (E.g. 'house style' for a pdf sales document or catalogue, or html web page.) Style sheets may be chosen from a library of pre-written elements; or written from scratch by the document editor/publisher -- the latter demanding appropriate knowledge of style sheet syntax. Note that there exists two different ways of applying these transformations, each with its own type of stylesheet (.dsl for DSSSL transformations; .xsl for XSL transformations). The result is a formatted file, whose filename extension is again used to denote the formatting used; e.g. .pdf, .ps, .xhtml. The third stage - document publication - demands only that a means be available to view or otherwise present ('publish') the document in its final form; e.g. web-browser for html document; graphical viewer for pdf document, etc. Some means of transferring screen output to printed hard-copy is also desirable. Of course, an overall fundamental requirement is for the availability of a working GNU/Linux system with all necessary software tools installed, and to which the editor/publisher has full user access. (This is easily achieved by, for example, running a copy of Ubuntu Linux [q.v.] or similar, from freely available CD - with no need to install any software to the hard disk.) Considerations concerning the storage and retrieval of documents in marked-up text form, or full final presentation form -- whether in the guise of a full document library or database of discrete text elements -- is beyond the scope of the present. SGML software tools The following are some of the most commonly-encountered SGML publishing software tools. Text production tools a) accessible from the command line cat ed / red [edit] / editor [pico] / nano mcedit b) accessible from under X nedit vi / vim / gvim emacs / xemacs c) text-processing apps Abiword LyX OpenOffice.org d) specialist markup apps quanta conglomerate amaya e) commercial wysiwyg editors HoTMetaL XMetaL Transformation tools a) DSSSL tools (use .dsl stylesheets) jade openjade b) XSL tools (use .xsl stylesheets) xsltproc saxon xalan c) 'N'-conversion tools xmlto Publication tools gs mozilla-firefox OpenOffice.org xdvi xpdf Commonly-used DTDs HTML Docbook TEI MathML CALS EAD ISO 12083 Exercises Produce DocBook XML marked up text for, and then process, using your own preferred toolchain: XML to: - rtf - tex - pdf - dvi - ps - html - xhtml - man - info - text Produce DocBook SGML marked up text for, and then process, using your own preferred toolchain: SGML to: - xml - rtf - tex - pdf - dvi - ps - html - man - info Links http://nwalsh.com/ http://docbook.org/ http://xml.org/ http://xsl.org/ http://xml.coverpages.org/ The latest version of this text may be found at: http://startext.demon.co.uk/SGMLdocs/ In the same directory will be found sample markup texts for students to experiment with. Further reading DocBook-OpenJade-SGML-XML-HOWTO.pdf DocBook-Demystification-HOWTO.pdf DocBook-Install.pdf DocBook_dbtexmath.pdf LDP-Author-Guide.pdf Glossary Add your own terms to the following glossary list: DocBook DSSSL DTD psgml SGML TEI XML XSL Index Producing an index for this document is left as an exercise for trainees. Write it yourself. Then mark it up. Enjoy!