Table of Contents
While Jabber.org welcomes documents in any format (the content is what matters!), we prefer documents to be formatted using DocBook XML. DocBook allows conversion to many formats (PDF, HTML, RTF, ASCII text, etc.) and frees the writer from trying to maintain a consistant formatting style. DocBook XML is easy to learn and can be written with any plain text editor (some WYSIWYG editors exist, but I've found it easier just to use vim!). This document steps through the basics of starting a DocBook document from scratch and processing it to other formats.
As can be seen, this guide is very brief. For more information about DocBook, read the online version of DocBook: The Definitive Guide (a.k.a. TDG) authored by Norman Walsh and published by O'Reilly & Associates.
Windows users, I highly recommend getting a better editor than Notepad. In Notepad and Wordpad, it is extremely difficult to find line numbers, something that is extremely useful when processing documents. NoteTab Light is nice freeware editor that well suits the needs of any DocBook author.
If you're familiar with HTML, you're already well on your way to understanding DocBook XML. DocBook is a markup language much like HTML, just with different elements (tags). I highly recommend reviewing the "Making an XML Document" section in Norman Walsh's book if you are not familiar with XML documents.
One critical difference between HTML (not XHTML) and XML is that "empty" elements must be closed internally (i.e. <xref/>).
Ready to jump into some of the fundamental elements? Load up your favourite text editor and join the fun!
Before any DocBook XML can be started, the document type must be stated:
<?xml version='1.0' encoding="UTF-8"?> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd">
The first line is standard for all XML documents; it specifies what version of XML the document is and what type of encoding the text is written in. The second and third lines specify that the document type is going to be an article and that we're using DocBook version 4.1.2. At the time of this writing, this is the most current version available.
The entire document must be wrapped within one main element. DocBook has several different main elements, depending on the size of the document. The most commonly used document type for Jabber documents is article. After the xxxx is stated, the actual DocBook elements can be started.
First some metadata must be specified. This usually includes information about the document title, author(s), contributors, and publication date. All of this information is inclosed with the articleinfo element.
artheader was used in versions of DocBook prior to 4.0 instead of articleinfo.
The metadata for this document:
<articleinfo> <title>Jabber.org's Quick Guide to DocBook XML</title> <author> <firstname>Eliot</firstname><surname>Landrum</surname> <affiliation> <orgname>Jabber.org</orgname> <address> <email>eliot@landrum.cx</email> </address> </affiliation> </author> <pubdate>2001/12/14</pubdate> </articleinfo>
Now the good stuff can begin! All text must be inclosed within sections. Each section is marked with, originally enough, the section element. Every section must have a title, marked with the title element. Sections can also have subsections, simply by inclosing section's inside other sections. All text must be inclosed within the section by para elements. Here's a simple section with an inclosed section:
<section> <title>Section 1</title> <para>This is section 1 with a nice little paragraph.</para> <section> <title>Section 1.1</title> <para>Here we have a subsection and the text of it.</para> </section> </section>
You can link to various parts of the document by using the link element. First, you must specify an ID for the element which you would like to link to. Nearly all the elements in DocBook may have the ID attribute. Each ID must be unique. The following is an example of using the ID attribute and the link element:
<section id="section1"> <title>Section 1</title> <para>This is section 1 with a nice little paragraph.</para> </section> <section id="section2"> <title>Section 2</title> <para>This is section 2. <link linkend="section1">Section 1</link> provides more in depth information.</para> </section>
To link to a URL, use the ulink element. Simply provide the URL in the url attribute:
<ulink url="http://www.jabber.org">Jabber.org</ulink>
A bulleted list:
<itemizedlist> <listitem> <para>Item 1</para> </listitem> <listitem> <para>Item 2</para> </listitem> </itemizedlist>
A variable list (useful for any definition lists):
<variablelist> <title>List of Variables</title> <varlistentry> <term>Jabber</term> <term>ICQ</term> <term>AIM</term> <listitem> <para>Instant messaging systems.</para> </listitem> </varlistentry> <varlistentry> <term>DocBook XML</term> <listitem> <para>A markup language for creating structured documents.</para> </listitem> </varlistentry> </variablelist>
To output to other formats, the XML must be processed. No matter the platform, the processing step is very similar. There are two basic ways that this can be done, both use the XSL stylesheets provided by Norman Walsh. I suggest downloading the current ZIP package, uncompress it somewhere nice and following along as we turn XML into HTML.
The simplest way is to add this processing instruction at the top of the document, right after the DOCTYPE has been declared:
<?xml-stylesheet type="text/xsl" href="docbook/html/docbook.xsl"?>
In the example, docbook/html/docbook.xsl is local to where the document is located on your hard drive. (docbook/html/docbook.xsl is in the downloadable package from Norman's web site.) After adding this processing directive, load the XML in an XSL-capable browser and the browser will do the transformation process. Unfortunatly, few web browsers are capable of this task. Supposedly, versions of Microsoft Internet Explorer greater than 5 can do it, but I was unable to get it to render DocBook XML document.
Using a browser to view the XML is excellent for quick proof-reading, but to output the data to formats available to a wider audience (HTML, RTF, PDF, etc.), an XSLT processor must be used. Many exist for many different platforms and languages.
I highly recommend SAXON, written by Michael Kay in Java for processing XML files in Windows. I was able to quickly get Instant SAXON 5.5.1 running on my Windows ME system with very little fuss. Just download it, uncompress it and run saxon -o output.html input.xml docbook/html/docbook.xsl at the command prompt. If you include the stylesheet directive shown above in the section called “Web Browser Transformation”, you can specify that saxon use that information with the -a flag.
The author recommends installing Sun's Java engine instead of Microsoft's for speed concerns. After using SAXON a few times, I have to agree.
As it turns out, SAXON works really great on Macintosh too. I found a very helpful (albeit a bit old) email from Bruce Rosenstock to help SAXON on a Macintosh. Since the command line is somewhat frozen in the application that JBindery creates, I suggest using the -a flag as described before.
I saved the best for last! There are lots of great XSLT engines for GNU/Linux as a quick search on Freshmeat.net will testify. I installed xsltproc from Debian because it was the first match on my search for XSLT. It installed quickly and is quite fast. For xsltproc, the syntax is simple (and quite similar to any other XSLT processor): xsltproc docbook/html/docbook.xsl infile.xml > outfile.html
I saved the best for last! There are lots of great XSLT engines for GNU/Linux as a quick search on Freshmeat.net will testify. I installed xsltproc from Debian because it was the first match on my search for XSLT. It installed quickly and is quite fast. For xsltproc, the syntax is simple (and quite similar to any other XSLT processor): xsltproc docbook/html/docbook.xsl infile.xml > outfile.html
I often use tidy from the W3.org to clean the HTML. More precisely, tidy with the -im flags seems to do the best.
Outputting a PDF or PS document is nearly as easy as outputting HTML. You may have to go through and make sure that your examples aren't destroyed in the process though. You'll need FOP from the Apache XML Project first. Instead of using the docbook/html/docbook.xsl stylesheet, you need to use the docbook/fo/docbook.xsl. This converts the XML to a formatting object file. This can then be converted to formats such as PDF. See Norman Walsh's instructions on this matter.
I have found the following sites invaluable in my DocBook XML work (many of these have been referenced in the above text):
W3.org - XML, XSL references and tidy.