SGML/XML document editing and publication on GNU/Linux systems
SGML/XML editing mini-tutorial
document editing and publication on GNU/Linux systems
Martin
Wheeler
S.
StarTEXT document engineering
Glastonbury,
England - UK
mwheeler@startext.co.uk
v0.3
2005-01-22
msw
2004, 2005
Martin S. Wheeler
These mini-tutorial notes are intended to give trainee document editors a bare-bones overview of the facilities open to them when they create their own personal SGML editing workstation, using the GNU/Linux operating system.
SGML
XML
DTD
editing
publishing
Linux
SGML Editing
Editing Technique
[NOTE: this document assumes that the reader has a rudimentary grasp of what
is meant by SGML markup; the types of documents it applies to; and why it is
being used.]
Editing, then publishing documents using SGML/XML is a three-stage process.
The first stage - text creation - is itself a two-part activity.
First, the basic text has to be created and written down.
This can be done any way the author wishes -- using whatever tool(s) come to
hand, from cat to OpenOffice.org. The only proviso is that the text be composed in ASCII plain text.
Second, this text has to have markup applied to it ('tagging').
The rules governing how markup is applied are provided in standard SGML DTDs,
thus providing the opportunity to employ DTD-aware text processing and
validation methods. Again, this is done in ASCII format.
The two activities may be combined into one simultaneous application of both.
This will depend on the author's degree of knowledge of markup technique and
familiarity with the relevant DTD and its allowable entities; plus the
availability of suitable DTD-aware text processors.
SGML-savvy authors may even wish to write their own conformant DTDs to specific in-house requirements.
The final result is an ASCII text file, containing marked-up ('tagged') text.
It is conventional to use the filename extension to indicate the type of markup employed; e.g .sgml, .xml.
The second stage - text transformation - is also a two-stage activity.
First the marked-up text must be passed through a processor to produce the desired final form of processed text (pdf, html, xhtml, dvi, ps, tex, etc.).
Simultaneously, presentational formatting is applied to the document via the application of a pre-defined style-sheet, to determine the final visual appearance of the document within the chosen document type. (E.g. 'house style' for a pdf sales document or catalogue, or html web page.)
Style sheets may be chosen from a library of pre-written elements; or written from scratch by the document editor/publisher -- the latter demanding appropriate knowledge of style sheet syntax.
Note that there exists two different ways of applying these transformations, each with its own type of stylesheet (.dsl for DSSSL transformations; .xsl for XSL transformations).
The result is a formatted file, whose filename extension is again used to denote the formatting used; e.g. .pdf, .ps, .xhtml.
The third stage - document publication - demands only that a means be available to view or
otherwise present ('publish') the document in its final form; e.g. web-browser
for html document; graphical viewer for pdf document, etc.
Some means of transferring screen output to printed hard-copy is also desirable.
Of course, an overall fundamental requirement is for the availability of a
working GNU/Linux system with all necessary software tools installed, and to
which the editor/publisher has full user access. (This is easily achieved by, for example, running a copy of Ubuntu Linux [q.v.] or similar, from freely available CD - with no need to install any software to the hard disk.)
Considerations concerning the storage and retrieval of documents in marked-up
text form, or full final presentation form -- whether in the guise of a full
document library or database of discrete text elements -- is beyond the scope
of the present.
SGML software tools
The following are some of the most commonly-encountered SGML publishing software tools.
Text production tools
a) accessible from the command line
cat
ed / red
[edit] / editor
[pico] / nano
mcedit
b) accessible from under X
nedit
vi / vim / gvim
emacs / xemacs
c) text-processing apps
Abiword
LyX
OpenOffice.org
d) specialist markup apps
quanta
conglomerate
amaya
e) commercial wysiwyg editors
HoTMetaL
XMetaL
Transformation tools
a) DSSSL tools (use .dsl stylesheets)
jade
openjade
b) XSL tools (use .xsl stylesheets)
xsltproc
saxon
xalan
c) 'N'-conversion tools
xmlto
Publication tools
gs
mozilla-firefox
OpenOffice.org
xdvi
xpdf
Commonly-used DTDs
HTML
Docbook
TEI
MathML
CALS
EAD
ISO 12083
Exercises
Produce DocBook XML marked up text for, and then process, using your own preferred toolchain:
XML to:
- rtf
- tex
- pdf
- dvi
- ps
- html
- xhtml
- man
- info
- text
Produce DocBook SGML marked up text for, and then process, using your own preferred toolchain:
SGML to:
- xml
- rtf
- tex
- pdf
- dvi
- ps
- html
- man
- info
Links
http://nwalsh.com/
http://docbook.org/
http://xml.org/
http://xsl.org/
http://xml.coverpages.org/
The latest version of this text may be found at:
http://startext.demon.co.uk/SGMLdocs/
In the same directory will be found sample markup texts for students to experiment with.
Further reading
DocBook-OpenJade-SGML-XML-HOWTO.pdf
DocBook-Demystification-HOWTO.pdf
DocBook-Install.pdf
DocBook_dbtexmath.pdf
LDP-Author-Guide.pdf
Glossary
Add your own terms to the following glossary list:
DocBook
DSSSL
DTD
psgml
SGML
TEI
XML
XSL
Index
Producing an index for this document is left as an exercise for trainees. Write it yourself. Then mark it up. Enjoy!