• Software independent considerations
    • ➟ General remarks
Why XML based publishing?
  • Extensibility

    • Define your grammar

    • XML core extensions (linking,...)

  • Interoperability

    • Cross-platform software support

  • Open standard, no vendor lock-in

  • Tons of (processing) frameworks / APIs

Quote from How and Why Are Companies Using XML?.

It's Not about You! It is about publishers.

  • they think it's their content

  • they want

    • to use it, re-use it, slice it, and dice it

    • to own it and control it

    • to have access to it and be able to move it

XML for publishing ...

  • saves time and money

  • is platform independent

  • avoids vendor lock-in

  • can be validated for QA

  • allows for creating different target formats

  • Refrain from fancy catalogs

  • Stick to simple layouts

    • Technical documentation

    • Law publications

Single source publishing
Separating Structure, content and format
Content

Words, images, audio / video

Structure

Chapters / sections, tables, lists

Presentation

Physical formatting (boldface, text size/color, ...)

Content
Hierarchical structure
Hierarchical structure, XML source
Presentation
Presentation
Structure Presentation
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Test</title>
    </head>
    <body>
      <section>
        <h1>Intro</h1>
        <p>Some content</p>
      </section>
    </body>
</html>
Structure / content Presentation (PDF)
\documentclass[12pt]{article}

\begin{document}
  A nice LaTeX formula:

  \begin{displaymath}
    e^x = \sum_{i=0}^{\infty}{x^i \over i!}
  \end{displaymath}

\end{document}
Pros Cons
  • Separation of editing / formatting concerns

  • Focus on content rather than formatting

  • Oblivious to format evolution (e.g. Epub)

  • Well suited for SCM, diff-ing

  • No true WYSIWYG

  • Fixed formatting rules, no flexibility

  • Less layout control, especially in print

Sample technical document
  • Well structured documents

  • Focus on content rather than style

  • Clearly defined semantics

  • Automated generation supporting multiple output channels

Pros Cons
  • Excellent typography

  • Large community

  • Mature engine

  • Excellent platform support

  • Multiple problem domain support

  • Extensible vocabulary

  • Focus on print

  • Bad office authoring tool support

    • Steep learning curve

    • Inverse editing

    • Cryptic error messages

  • Bloated vocabulary

XMLMind Editor
  • Strictly validating, near WYSIWYG, DocBook / DITA / MathML / XHTML editor.

  • Plugin architecture

  • Cross-platform Java based.

OxygenXML Editor
  • Full-fledged XML IDE.

  • Strictly validating, near WYSIWYG, DocBook / DITA / MathML / XHTML ... editor.

  • Eclipse based

  • Software independent considerations
    • ➟ Common building blocks
HTML
<p><b>Very</b> tiny</p>
Docbook
<para><emphasis>Very</emphasis> tiny.</para>
LaTeX
\textbf{Very} tiny.
Rendering Very tiny
  • Software independent considerations
    • ➟ Common building blocks
      • ➟ Block level elements
HTML
<p>A paragraph</p>
Docbook
<para>A paragraph</para>
LaTeX
A paragraph\par
Rendering A paragraph
HTML
<ul>
 <li>One</li>
 <li>Two</li>
</ul>
Docbook
<itemizedlist>
 <listitem>
  <para>One</para>
 </listitem>

 <listitem>
  <para>Two</para>
 </listitem>
</itemizedlist>
LaTeX
\begin{itemize}
 \item One
 \item Two
\end{itemize}
Rendering
  • One

  • Two

HTML
<table>
  <tr>
    <td>a1</td>
    <td>a2</td>
  </tr>
  <tr>
    <td>b1</td>
    <td>b2</td>
  </tr>
</table>
Docbook
<informaltable>
  <tr>
    <td>a1</td>
    <td>a2</td>
  </tr>
  <tr>
    <td>b1</td>
    <td>b2</td>
  </tr>
</informaltable>
LaTeX
\begin{tabular}{ll}
   a1 & a2 \\
   b1 & b2 \\
\end{tabular}
Rendering
a1 a1
b1 b2
HTML
<img src=
   'smoke.png'/>
Docbook
<mediaobject>
 <imageobject>
   <imagedata fileref
     ="smoke.png"/>
 </imageobject>
</mediaobject>
LaTeX
\includegraphics
  {smoke.png}
Rendering
Images
HTML / Docbook
<m:math>
  <m:mrow>
    <m:munderover>
      <m:mo>∫</m:mo>
          ...
    <m:msqrt>
      <m:mi>π</m:mi>
    </m:msqrt>
  </m:mrow>
</m:math>
LaTeX
\begin{equation}
  \int\limits_{-\infty}^{+\infty}
    e^{-x²} dx = \sqrt{\pi}
\end{equation}
Rendering
- + e - x 2 d x = π
HTML
<h1 id="start"
>First section</h1>
<p>A remark.</p>

<h2>A subsection</h2>
<p>See <a href="#start"
>remark</a>.</p>
Docbook
<section xml:id="start">
 <title>First
   section</title>
 <para>A remark.</para>
 <section>
   <title>A subsection
     </title>
   <para>See
 <link linkend="start"
   >remark</link>.</para>
 </section>
</section>
LaTeX
\section{\label{start
}First section}
A remark.

\subsection{A subsection}
See remark at page
\pageref{start}.
Rendering

First section



A remark



See remark at page 1.

HTML LaTeX Docbook
<h1> <section> recursive \chapter <part>
<h2> \section <book>
<h3> \subsection <chapter>
<h4> \subsubsection <sect1> <section> recursive
<h5> \paragraph <sect2>
<h6> \subparagraph <sect3>
HTML
  <body>
   ...
    <object name="foo" type="text/html" data="table.html"/>
   ...
  </body>
Docbook
  <part xml:id="sd1">
    <title>Software development 1</title>
    <xi:include href="Sd1/gettingStarted.xml" xpointer="element(/1)"/>
    <xi:include href="Sd1/languageFundamentals.xml" xpointer="element(/1)"/>
...
LaTeX
\documentclass{article}
\input{mydefs.tex}
\begin{document}
...
\include{math.tex}
...
\end{document}
  • Software independent considerations
  • Focus on technical documentation

  • Excellent authoring user interface

  • Semantic markup language

    • XML based

Authoring and publishing
<section version="5.1"
  xmlns="http://docbook.org/ns/docbook"
  ...>

  <title>A Title</title>

  <para>A paragraph</para>
</section>

Software specific support:

Software centric schema
Document targets
  • Target format generators

    • XSL style sheets targeting HTML and FO

    • CSS and JavaScript for generated HTML

  • PDF

  • Epub(3)

  • Slides

  • ...

Editing / office
Editing / programming

emacs, vi, notepad, XML IDE, ...

XSLT processors

Saxon 6.5.5, Xalan, ...

FO (PDF) processors
Docbook 5.x

Based on RelaxNG grammar

Docbook 4.x (old / outdated)

Based on DTD grammar

  • Software independent considerations
    • Docbook
      • ➟ Target formats
        • ➟ HTML
  • HTML 5 based

  • Client side full text search index by virtue of JavaScript (Apache Lucene)

  • JavaScript based navigation

  • 3-rd party tool integration e.g. MathJax

  • Software independent considerations
Eclipse help
  • Software independent considerations
    • Docbook
      • ➟ Target formats
        • ➟ Printed output
View Docbook HTML

Some text.

<para>Some text</para>
<p style='color:red'
  >Some text.</p>

Caution: No style / formatting related parameters in Docbook.

This is by design and on purpose.

Reference: See Paragraph elements.

View Docbook HTML

.

  • Bee

  • Ant

<itemizedlist>
  <listitem>
    <para>Bee</para>
  </listitem>
  <listitem>
    <para>Ant</para>
  </listitem>
</itemizedlist>
<ul>
  <li>
    <p>Bee</p>
  </li>
  <li>
    <p>Ant</p>
  </li>
</ul>
View Docbook HTML

””.

  1. Bee

  2. Ant

<orderedlist>
  <listitem>
    <para>Bee</para>
  </listitem>

  <listitem>
    <para>Ant</para>
  </listitem>
</orderedlist>
<ol>
  <li>
    <p>Bee</p>
  </li>
  <li>
    <p>Ant</p>
  </li>
</ol>
View Docbook HTML

.

Bee

Insect

Mouse

Mammal

<glosslist>
  <glossentry>
    <glossterm>Bee</glossterm>
    <glossdef>
      <para>Insect</para>
    </glossdef>
  </glossentry>
  <glossentry>
    <glossterm>Mouse</glossterm>
    <glossdef>
      <para>Mammal</para>
    </glossdef>
  </glossentry>
</glosslist>
<dl>
  <dt>Bee</dt>
  <dd>Insect</dd>
  <dt>Mouse</dt>
  <dd>Mammal</dd>
</dl>
View Docbook HTML

.

  1. Coffee

  2. Tea

    • black

    • green

<orderedlist>
  <listitem>
    <para>Coffee</para>
  </listitem>
  <listitem>
    <para>Tea</para>
    <itemizedlist>
      <listitem>
        <para>black</para>
      </listitem>
      <listitem>
        <para>green</para>
      </listitem>
    </itemizedlist>
  </listitem>
</orderedlist>
<ol>
  <li>
    <p>Coffee</p>
  </li>
  <li>
    <p>Tea</p>
    <ul>
      <li>black</li>
      <li>green</li>
    </ul>
  </li>
</ol>

See List elements.

View Docbook HTML

.

A table
<informaltable border="1">
  <tr>
    <th>Col 1</th>
    <th>Col 2</th>
  </tr>
  <tr>
    <td>A1</td>
    <td>A2</td>
  </tr>
  <tr>
    <td colspan="2">B</td>
  </tr>
</informaltable>
<table border="1">
  <tr>
    <th>Col 1</th>
    <th>Col 2</th>
  </tr>
  <tr>
    <td>A1</td>
    <td>A2</td>
  </tr>
  <tr>
    <td colspan="2">B</td>
  </tr>
</table>
View Docbook HTML
E = m c 2
<informalequation>
  <m:math display="block">
    <m:mrow>
      <m:mi>E</m:mi>
      <m:mo>=</m:mo>
      <m:mrow>
        <m:mi>m</m:mi>
        <m:msup>
          <m:mi>c</m:mi>
          <m:mi>2</m:mi>
        </m:msup>
      </m:mrow>
    </m:mrow>
  </m:math>
</informalequation>
<math display="block">
  <mrow>
    <m:mi>E</m:mi>
    <m:mo>=</m:mo>
    <m:mrow>
      <m:mi>m</m:mi>
      <m:msup>
        <m:mi>c</m:mi>
        <m:mi>2</m:mi>
      </m:msup>
    </m:mrow>
  </mrow>
</math>
Docbook HTML
<informalequation>
  <mathphrase>
  $ |x| = \left\{
   \begin{array}{rl}
    -x & \mbox{if $x&lt;0$} \\
     x & \mbox{otherwise}
   \end{array}\right.$
  </mathphrase>
</informalequation>
<span class="mathphrase">
  $ |x| = \left\{
   \begin{array}{rl}
    -x & \mbox{if $x&lt;0$} \\
     x & \mbox{otherwise}
   \end{array}\right.$
</span>
$ |x| = \left\{ \begin{array}{rl} -x &\mbox{if $x<0$} \\ x &\mbox{otherwise} \end{array} \right.$

See Formal elements.

Mountain spring

Figure
<figure >
  <title>Mountain spring</title>
  <mediaobject>
    <imageobject>
      <imagedata fileref=
        "Ref/DbookIntro/mountain.jpg"/>
    </imageobject>
  </mediaobject>
</figure>

.

Image map + calloutlist

Seat

❸❷

Valves

<mediaobject>
  <imageobjectco>
    <areaspec ...>
      <area coords="83,16,340,187"
          xml:id="a1" linkends="c1"/>
      ...
    </areaspec>
    <imageobject>
      <imagedata fileref="recumbent.png.svg"/>
    </imageobject>
    <calloutlist>
      <callout arearefs="a1" xml:id="c1">
        <para>Seat</para>
      </callout>
      <callout arearefs="a1 a2" xml:id="c1">
        <para>Valves</para>
      </callout>
    </calloutlist>
  </imageobjectco>
</mediaobject>

Video courtesy of Big Buck Bunny.

<videoobject>
  <videodata
     fileref="buckBunny.mp4"
      format="video/mp4">
    <multimediaparam
          name="controls"
         value="controls"/>
  </videodata>
</videoobject>
  • Software independent considerations
    • Docbook
      • ➟ Selected elements
        • ➟ Admonition elements
View Docbook

Caution

Beware of overheating!

<caution>
  <para>Beware of overheating!</para>
</caution>

See Admonition elements: important, note, tip, warning.

  • Software independent considerations
    • Docbook
      • ➟ Selected elements
        • ➟ Sectioning elements
<chapter version="5.1"
  xmlns="http://docbook.org/ns/docbook">
  <title>Top</title>
  <section>
    <title>Level 1</title>
    <section>
      <title>Level 2</title>
      <section>
        <title>Level 3</title>
        <para>Hello!</para>
      </section>
    </section>
  </section>
</chapter>
<html>
  ...
  <body>
    <h1>Top</h1>
    <h2>Level 1</h2>
    <h3>Level 2</h3>
    <h4>Level 3</h4>
    <p>Hello!</p></body>
</html>
<chapter version="5.1"
  xmlns="http://docbook.org/ns/docbook">
  <title>Top</title>
  <sect1>
    <title>Level 1</title>
    <sect2>
      <title>Level 2</title>
      <sect3>
        <title>Level 3</title>
        <para>Hello!</para>
      </sect3>
    </sect2>
  </sect1>
</chapter>
<html>
  ...
  <body>
    <h1>Top</h1>
    <h2>Level 1</h2>
    <h3>Level 2</h3>
    <h4>Level 3</h4>
    <p>Hello!</p></body>
</html>

See <chapter>, <section>, <sect1>, <sect2>, <sect3>, <sect4>, <5>, <sect5>, <sect6>, <simplesect>, <refentry>.

Internal document links

Referential integrity by ID / IDREF constraints:

<chapter id="intro">
...
<chapter> ...
See <xref linkend="intro"/> ...
External links

These are usual hypertext links:

<para>See
<link href="http://tdg.docbook.org">Docbook</link>
.</para>
Internal document links
  • Software independent considerations
    • Docbook
      • ➟ Selected elements
        • ➟ Top level elements
  • Root element is purpose dependent

  • Schema based options in Docbook 5.x (RelaxNG) requiring an <info> child in 5.1.

  • No limitation in Docbook 4.x (DTD).

  • Software independent considerations
 on top of RelaxNG
Using Display
        #Anchors

The page's URI based on xml:id value introduction.

Stable https://.../introduction.html#firstSection.

Unstable https://.../introduction.html#d03213

Requirement

Important elements (<chapter>, <section>, <table>...) must provide an xml:id value.

Implementation choices
  • Modify underlying RelaxNG schema.

    Result: Restricted schema (Inheritance relationship)

  • Add Schematron integrity rule on top of schema.

  • Software independent considerations
HTML customization overview
  • Software independent considerations
<book ...>
  <title>XML for Newbies</title>
  <chapter xml:id="intro">
    <title>Introduction</title>
    <para>...</para>
  </chapter>
  <chapter xml:id="work">
    <title>Working with objects</title>
    <para>...</para>
  </chapter>
</book>

Navigation structure. 

  • Index.html

  • Per chapter:

    • ch01.html

    • ch02.html

Synthetically generated filenames.

<book ...>
  <title>XML for Newbies</title>
  <chapter xml:id="intro">
    <title>Introduction</title>
    <para>...</para>
  </chapter>
  <chapter xml:id="work">
    <title>Working with objects</title>
    <para>...</para>
  </chapter>
</book>

Navigation structure. 

  • Index.html

  • Per chapter:

    • intro.html

    • work.html

Providing link stability:

Parameter: use.id.as.filename
Customization parameter ulink.target
callout.unicode
          / callout.graphics
Tweaking Docbook transformation parameter.
  • Software independent considerations
Hooking into XSL
  • Adding Javascript

    • Touch gestures

    • Dynamic elements

  • Embedded objects

  • Headers and footers

    • Company logo

    • Navigation icons

  • Front page

  <xsl:template match="d:videodata">
    <video controls="controls" preload="auto">
      <xsl:attribute name="title">
        <xsl:value-of select="normalize-space(../../../d:title)"/>
      </xsl:attribute>

      <xsl:variable name="imageFilename">
        <xsl:call-template name="mediaobject.filename">
          <xsl:with-param name="object" select=".."/>
        </xsl:call-template>
      </xsl:variable>

      <source src="{$imageFilename}" type='video/mp4' />
      <source src="{$imageFilename}.ogv"/>
    </video>
  </xsl:template>
  • Software independent considerations
    • Docbook
      • ➟ Customizing
        • ➟ CSS
Customize by
div.example > p.title,
div.figure > p.title,fig
div.table > p.title,
div.procedure > p.title,
div.equation > p.title {
    color: #394986;
    font-weight: bold;
}
Tweaking Docbook's default CSS.
  • Software independent considerations
    • Docbook
      • ➟ Styling the editor application
  • CSS

  • Plugins e.g. representing tables.

  • Folding mode by CSS.

  • Software independent considerations
    • ➟ Modular documents
Motivating modular documents
  • Multiple author editing conflicts

  • User interface limits

  • No document component reuse

Document decomposition
<book version="5.1"
  xmlns="http://docbook.org/ns/docbook">
  <chapter version="5.1" xml:id="start">
    <title>Start</title>
    <para>See <xref linkend="intro" />.</para>
  </chapter>
  <chapter xml:id="intro" >
    <title>Introduction</title>
    <para>Basic stuff.</para>
  </chapter>
</book>

An internal link.

Internal link target.

master.xml

<book version="5.1" 
  xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"> 
  <xi:include href="start.xml" 
     xpointer="element(/1)"/> 

  <xi:include href="intro.xml" 
     xpointer="element(/1)"/> 
</book>

start.xml

<chapter version="5.1" 
xmlns="http://docbook.org/ns/docbook">
  <title>Start</title>
  <para>See
     <xref linkend="intro"/>.</para>
</chapter>

intro.xml

<chapter version="5.1" 
xmlns="http://docbook.org/ns/docbook">
<title>Introduction</title>
  <para>Basic stuff.</para>
</chapter>
Internal links and modular documents
  • Software independent considerations
    • ➟ RelaxNG Schema
  1. REgular LAnguage for XML Next Generation (RelaxNG)

  2. Schematron

  3. XML Schema (XSD)

  4. Document Type Definition (DTD)

Schema Doc instance
<element name="aBook">
  <zeroOrMore>
    <element name="person">
      <element name="fullName">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>
<aBook>
  <person>
    <fullName>Jim Bone</fullName>
    <email>bone@mycity.com</email>
  </person>
</aBook>
Inventing a <book> grammar
  • Software independent considerations
    • ➟ Transforming documents
      • ➟ Target format HTML

Problem regarding Figure 1013, “Single source publishing”:

<book version="5.1" ...>
  ...
  <chapter>
    <title>Introduction</title>
    <para>First section.</para>
  </chapter> ...
</book>
<html>
  <head>...</head>
  <body>
     <h1>Introduction</h1>
     <p>First section.</p> ...
  </body>
</html>
<xsl:template match="/book">
  <html>
    <head> ... </head>
    <body>
      <h1>
        <xsl:value-of select="title"/>
      </h1>
    </body>
  </html>
</xsl:template>
<xsl:template match="title">
  <h1>
    <xsl:value-of select="."/>
  </h1>
</xsl:template>
<title>Some content</title>

gets converted to:

<h1>Some content</h1>
  1. Formatting <book> instances
  2. Providing red background indicating foreign phrases
  3. Splitting your document into chunks
  • Software independent considerations
    • ➟ Transforming documents
      • ➟ Target format print
  1. Creating a desired FO target example
  2. Transforming <book> instances to PDF