Why XML based publishing?

Why XML based publishing?

XML features

  • Extensibility

    • Define your grammar

    • XML core extensions (linking,...)

  • Interoperability

    • Cross-platform software support

  • Open standard, no vendor lock-in

  • Tons of (processing) frameworks / APIs

Editors, compositors, designers ...

Quote from How and Why Are Companies Using XML?.

It's Not about You! It is about publishers.

  • they think it's their content

  • they want

    • to use it, re-use it, slice it, and dice it

    • to own it and control it

    • to have access to it and be able to move it

Promises in publishing

XML for publishing ...

  • saves time and money

  • is platform independent

  • avoids vendor lock-in

  • can be validated for QA

  • allows for creating different target formats

Publishing reality

  • Refrain from fancy catalogs

  • Stick to simple layouts

    • Technical documentation

    • Law publications

Single source publishing

Single source publishing

Separating Structure, content and format

Separating Structure, content and format

Separating concerns

Content

Words, images, audio / video

Structure

Chapters / sections, tables, lists

Presentation

Physical formatting (boldface, text size/color, ...)

Content

Content

Hierarchical structure

Hierarchical structure

Hierarchical structure, XML source

Hierarchical structure, XML source

Presentation

Presentation
Presentation

Example 1: HTML 5, pure structure

Structure Presentation
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Test</title>
    </head>
    <body>
      <section>
        <h1>Intro</h1>
        <p>Some content</p>
      </section>
    </body>
</html>

Example 2: TeX / LaTeX

Structure / content Presentation (PDF)
\documentclass[12pt]{article}

\begin{document}
  A nice LaTeX formula:

  \begin{displaymath}
    e^x = \sum_{i=0}^{\infty}{x^i \over i!}
  \end{displaymath}

\end{document}

Separating structure and presentation(s)

Pros Cons
  • Separation of editing / formatting concerns

  • Focus on content rather than formatting

  • Oblivious to format evolution (e.g. Epub)

  • Well suited for SCM, diff-ing

  • No true WYSIWYG

  • Fixed formatting rules, no flexibility

  • Less layout control, especially in print

Sample technical document

Sample technical document

Observations

  • Well structured documents

  • Focus on content rather than style

  • Clearly defined semantics

  • Automated generation supporting multiple output channels

Pros and cons of TeX / LaTeX

Pros Cons
  • Excellent typography

  • Large community

  • Mature engine

  • Excellent platform support

  • Multiple problem domain support

  • Extensible vocabulary

  • Focus on print

  • Bad office authoring tool support

    • Steep learning curve

    • Inverse editing

    • Cryptic error messages

  • Bloated vocabulary

Tools of the trade

XMLMind Editor
  • Strictly validating, near WYSIWYG, DocBook / DITA / MathML / XHTML editor.

  • Plugin architecture

  • Cross-platform Java based.

OxygenXML Editor
  • Full-fledged XML IDE.

  • Strictly validating, near WYSIWYG, DocBook / DITA / MathML / XHTML ... editor.

  • Eclipse based

Inline formatting

HTML
<p><b>Very</b> tiny</p>
Docbook
<para><emphasis>Very</emphasis> tiny.</para>
LaTeX
\textbf{Very} tiny.
Rendering Very tiny

Paragraphs

HTML
<p>A paragraph</p>
Docbook
<para>A paragraph</para>
LaTeX
A paragraph\par
Rendering A paragraph

Lists

HTML
<ul>
 <li>One</li>
 <li>Two</li>
</ul>
Docbook
<itemizedlist>
 <listitem>
  <para>One</para>
 </listitem>

 <listitem>
  <para>Two</para>
 </listitem>
</itemizedlist>
LaTeX
\begin{itemize}
 \item One
 \item Two
\end{itemize}
Rendering
  • One

  • Two

Tables

HTML
<table>
  <tr>
    <td>a1</td>
    <td>a2</td>
  </tr>
  <tr>
    <td>b1</td>
    <td>b2</td>
  </tr>
</table>
Docbook
<informaltable>
  <tr>
    <td>a1</td>
    <td>a2</td>
  </tr>
  <tr>
    <td>b1</td>
    <td>b2</td>
  </tr>
</informaltable>
LaTeX
\begin{tabular}{ll}
   a1 & a2 \\
   b1 & b2 \\
\end{tabular}
Rendering
a1 a1
b1 b2

Images

HTML
<img src=
   'smoke.png'/>
Docbook
<mediaobject>
 <imageobject>
   <imagedata fileref
     ="smoke.png"/>
 </imageobject>
</mediaobject>
LaTeX
\includegraphics
  {smoke.png}
Rendering
Images

Mathematical formulas

HTML / Docbook
<m:math>
  <m:mrow>
    <m:munderover>
      <m:mo>∫</m:mo>
          ...
    <m:msqrt>
      <m:mi>π</m:mi>
    </m:msqrt>
  </m:mrow>
</m:math>
LaTeX
\begin{equation}
  \int\limits_{-\infty}^{+\infty}
    e^{-x²} dx = \sqrt{\pi}
\end{equation}
Rendering
- + e - x 2 d x = π

Cross references

HTML
<h1 id="start"
>First section</h1>
<p>A remark.</p>

<h2>A subsection</h2>
<p>See <a href="#start"
>remark</a>.</p>
Docbook
<section xml:id="start">
 <title>First
   section</title>
 <para>A remark.</para>
 <section>
   <title>A subsection
     </title>
   <para>See
 <link linkend="start"
   >remark</link>.</para>
 </section>
</section>
LaTeX
\section{\label{start
}First section}
A remark.

\subsection{A subsection}
See remark at page
\pageref{start}.
Rendering

First section



A remark



See remark at page 1.

Document sectioning

HTML LaTeX Docbook
<h1> <section> recursive \chapter <part>
<h2> \section <book>
<h3> \subsection <chapter>
<h4> \subsubsection <sect1> <section> recursive
<h5> \paragraph <sect2>
<h6> \subparagraph <sect3>

Modular document components

HTML
  <body>
   ...
    <object name="foo" type="text/html" data="table.html"/>
   ...
  </body>
Docbook
  <part xml:id="sd1">
    <title>Software development 1</title>
    <xi:include href="Sd1/gettingStarted.xml" xpointer="element(/1)"/>
    <xi:include href="Sd1/languageFundamentals.xml" xpointer="element(/1)"/>
...
LaTeX
\documentclass{article}
\input{mydefs.tex}
\begin{document}
...
\include{math.tex}
...
\end{document}

What is Docbook?

  • Focus on technical documentation

  • Excellent authoring user interface

  • Semantic markup language

    • XML based

Authoring and publishing

Authoring and publishing

Document representation

<section version="5.1"
  xmlns="http://docbook.org/ns/docbook"
  ...>

  <title>A Title</title>

  <para>A paragraph</para>
</section>

Software centric schema

Software specific support:

Software centric schema

Document targets

Document targets

Docbook components

  • Target format generators

    • XSL style sheets targeting HTML and FO

    • CSS and JavaScript for generated HTML

Target format overview

Tooling / Software

Editing / office
Editing / programming

emacs, vi, notepad, XML IDE, ...

XSLT processors

Saxon 6.5.5, Xalan, ...

FO (PDF) processors

Different schema languages

Docbook 5.x

Based on RelaxNG grammar

Docbook 4.x (old / outdated)

Based on DTD grammar

Plain HTML

Web help

  • HTML 5 based

  • Client side full text search index by virtue of JavaScript (Apache Lucene)

  • JavaScript based navigation

  • 3-rd party tool integration e.g. MathJax

Eclipse help

Eclipse help

Printed output

Paragraph

View Docbook HTML

Some text.

<para>Some text</para>
<p style='color:red'
  >Some text.</p>

Caution: No style / formatting related parameters in Docbook.

This is by design and on purpose.

Reference: See Paragraph elements.

Itemized list

View Docbook HTML

.

  • Bee

  • Ant

<itemizedlist>
  <listitem>
    <para>Bee</para>
  </listitem>
  <listitem>
    <para>Ant</para>
  </listitem>
</itemizedlist>
<ul>
  <li>
    <p>Bee</p>
  </li>
  <li>
    <p>Ant</p>
  </li>
</ul>

Ordered list

View Docbook HTML

””.

  1. Bee

  2. Ant

<orderedlist>
  <listitem>
    <para>Bee</para>
  </listitem>

  <listitem>
    <para>Ant</para>
  </listitem>
</orderedlist>
<ol>
  <li>
    <p>Bee</p>
  </li>
  <li>
    <p>Ant</p>
  </li>
</ol>

Glossary list

View Docbook HTML

.

Bee

Insect

Mouse

Mammal

<glosslist>
  <glossentry>
    <glossterm>Bee</glossterm>
    <glossdef>
      <para>Insect</para>
    </glossdef>
  </glossentry>
  <glossentry>
    <glossterm>Mouse</glossterm>
    <glossdef>
      <para>Mammal</para>
    </glossdef>
  </glossentry>
</glosslist>
<dl>
  <dt>Bee</dt>
  <dd>Insect</dd>
  <dt>Mouse</dt>
  <dd>Mammal</dd>
</dl>

Nested lists

View Docbook HTML

.

  1. Coffee

  2. Tea

    • black

    • green

<orderedlist>
  <listitem>
    <para>Coffee</para>
  </listitem>
  <listitem>
    <para>Tea</para>
    <itemizedlist>
      <listitem>
        <para>black</para>
      </listitem>
      <listitem>
        <para>green</para>
      </listitem>
    </itemizedlist>
  </listitem>
</orderedlist>
<ol>
  <li>
    <p>Coffee</p>
  </li>
  <li>
    <p>Tea</p>
    <ul>
      <li>black</li>
      <li>green</li>
    </ul>
  </li>
</ol>

Reference

See List elements.

A table

View Docbook HTML

.

A table
<informaltable border="1">
  <tr>
    <th>Col 1</th>
    <th>Col 2</th>
  </tr>
  <tr>
    <td>A1</td>
    <td>A2</td>
  </tr>
  <tr>
    <td colspan="2">B</td>
  </tr>
</informaltable>
<table border="1">
  <tr>
    <th>Col 1</th>
    <th>Col 2</th>
  </tr>
  <tr>
    <td>A1</td>
    <td>A2</td>
  </tr>
  <tr>
    <td colspan="2">B</td>
  </tr>
</table>

A MathML equation

View Docbook HTML
E = m c 2
<informalequation>
  <m:math display="block">
    <m:mrow>
      <m:mi>E</m:mi>
      <m:mo>=</m:mo>
      <m:mrow>
        <m:mi>m</m:mi>
        <m:msup>
          <m:mi>c</m:mi>
          <m:mi>2</m:mi>
        </m:msup>
      </m:mrow>
    </m:mrow>
  </m:math>
</informalequation>
<math display="block">
  <mrow>
    <m:mi>E</m:mi>
    <m:mo>=</m:mo>
    <m:mrow>
      <m:mi>m</m:mi>
      <m:msup>
        <m:mi>c</m:mi>
        <m:mi>2</m:mi>
      </m:msup>
    </m:mrow>
  </mrow>
</math>

A TeX equation

Docbook HTML
<informalequation>
  <mathphrase>
  $ |x| = \left\{
   \begin{array}{rl}
    -x & \mbox{if $x&lt;0$} \\
     x & \mbox{otherwise}
   \end{array}\right.$
  </mathphrase>
</informalequation>
<span class="mathphrase">
  $ |x| = \left\{
   \begin{array}{rl}
    -x & \mbox{if $x&lt;0$} \\
     x & \mbox{otherwise}
   \end{array}\right.$
</span>
$ |x| = \left\{ \begin{array}{rl} -x &\mbox{if $x<0$} \\ x &\mbox{otherwise} \end{array} \right.$

Reference

See Formal elements.

Figure

Mountain spring

Figure
<figure >
  <title>Mountain spring</title>
  <mediaobject>
    <imageobject>
      <imagedata fileref=
        "Ref/DbookIntro/mountain.jpg"/>
    </imageobject>
  </mediaobject>
</figure>

Image map + calloutlist

.

Image map + calloutlistSeatValveValve

Seat

❸❷

Valves

<mediaobject>
  <imageobjectco>
    <areaspec ...>
      <area coords="83,16,340,187"
          xml:id="a1" linkends="c1"/>
      ...
    </areaspec>
    <imageobject>
      <imagedata fileref="recumbent.png.svg"/>
    </imageobject>
    <calloutlist>
      <callout arearefs="a1" xml:id="c1">
        <para>Seat</para>
      </callout>
      <callout arearefs="a1 a2" xml:id="c1">
        <para>Valves</para>
      </callout>
    </calloutlist>
  </imageobjectco>
</mediaobject>

Video

Video courtesy of Big Buck Bunny.

<videoobject>
  <videodata
     fileref="buckBunny.mp4"
      format="video/mp4">
    <multimediaparam
          name="controls"
         value="controls"/>
  </videodata>
</videoobject>

A warning

View Docbook

Caution

Beware of overheating!

<caution>
  <para>Beware of overheating!</para>
</caution>

Reference

See Admonition elements: important, note, tip, warning.

Recursive sections

<chapter version="5.1"
  xmlns="http://docbook.org/ns/docbook">
  <title>Top</title>
  <section>
    <title>Level 1</title>
    <section>
      <title>Level 2</title>
      <section>
        <title>Level 3</title>
        <para>Hello!</para>
      </section>
    </section>
  </section>
</chapter>
<html>
  ...
  <body>
    <h1>Top</h1>
    <h2>Level 1</h2>
    <h3>Level 2</h3>
    <h4>Level 3</h4>
    <p>Hello!</p></body>
</html>

Non-recursive sections

<chapter version="5.1"
  xmlns="http://docbook.org/ns/docbook">
  <title>Top</title>
  <sect1>
    <title>Level 1</title>
    <sect2>
      <title>Level 2</title>
      <sect3>
        <title>Level 3</title>
        <para>Hello!</para>
      </sect3>
    </sect2>
  </sect1>
</chapter>
<html>
  ...
  <body>
    <h1>Top</h1>
    <h2>Level 1</h2>
    <h3>Level 2</h3>
    <h4>Level 3</h4>
    <p>Hello!</p></body>
</html>

See <chapter>, <section>, <sect1>, <sect2>, <sect3>, <sect4>, <5>, <sect5>, <sect6>, <simplesect>, <refentry>.

Two different link flavours

Internal document links

Referential integrity by ID / IDREF constraints:

<chapter id="intro">
...
<chapter> ...
See <xref linkend="intro"/> ...
External links

These are usual hypertext links:

<para>See
<link href="http://tdg.docbook.org">Docbook</link>
.</para>

Followup exercise

No. 1: Internal document links

Choosing a top level element

  • Root element is purpose dependent

  • Schema based options in Docbook 5.x (RelaxNG) requiring an <info> child in 5.1.

  • No limitation in Docbook 4.x (DTD).

Allowed 5.1 top level elements

Schematron on top of RelaxNG

Schematron on top of RelaxNG

Using Display #Anchors

Using Display #Anchors

The page's URI based on xml:id value introduction.

Stable https://.../introduction.html#firstSection.

Unstable https://.../introduction.html#d03213

Considerations author based permalink

Requirement

Important elements (<chapter>, <section>, <table>...) must provide an xml:id value.

Implementation choices
  • Modify underlying RelaxNG schema.

    Result: Restricted schema (Inheritance relationship)

  • Add Schematron integrity rule on top of schema.

HTML customization overview

HTML customization overview

Target specific configuration

Link stability

<book ...>
  <title>XML for Newbies</title>
  <chapter xml:id="intro">
    <title>Introduction</title>
    <para>...</para>
  </chapter>
  <chapter xml:id="work">
    <title>Working with objects</title>
    <para>...</para>
  </chapter>
</book>

Navigation structure. 

  • Index.html

  • Per chapter:

    • ch01.html

    • ch02.html

Synthetically generated filenames.

use.id.as.filename = 1

<book ...>
  <title>XML for Newbies</title>
  <chapter xml:id="intro">
    <title>Introduction</title>
    <para>...</para>
  </chapter>
  <chapter xml:id="work">
    <title>Working with objects</title>
    <para>...</para>
  </chapter>
</book>

Navigation structure. 

  • Index.html

  • Per chapter:

    • intro.html

    • work.html

Providing link stability:

Parameter: use.id.as.filename

Parameter: use.id.as.filename

Customization parameter ulink.target

Customization parameter ulink.target

callout.unicode / callout.graphics

callout.unicode / callout.graphics

Followup exercise

No. 2: Tweaking Docbook transformation parameter.

Hooking into XSL

Hooking into XSL

Categories

  • Adding Javascript

    • Touch gestures

    • Dynamic elements

  • Embedded objects

  • Headers and footers

    • Company logo

    • Navigation icons

  • Front page

Example: videos

  <xsl:template match="d:videodata">
    <video controls="controls" preload="auto">
      <xsl:attribute name="title">
        <xsl:value-of select="normalize-space(../../../d:title)"/>
      </xsl:attribute>

      <xsl:variable name="imageFilename">
        <xsl:call-template name="mediaobject.filename">
          <xsl:with-param name="object" select=".."/>
        </xsl:call-template>
      </xsl:variable>

      <source src="{$imageFilename}" type='video/mp4' />
      <source src="{$imageFilename}.ogv"/>
    </video>
  </xsl:template>

Customize by CSS

Customize by CSS

Example CSS modifications

div.example > p.title,
div.figure > p.title,fig
div.table > p.title,
div.procedure > p.title,
div.equation > p.title {
    color: #394986;
    font-weight: bold;
}

Followup exercise

No. 3: Tweaking Docbook's default CSS.

Styling the editor

  • CSS

  • Plugins e.g. representing tables.

  • Folding mode by CSS.

Motivating modular documents

Motivating modular documents

Monolithic document problems

  • Multiple author editing conflicts

  • User interface limits

  • No document component reuse

Document decomposition

Document decomposition

A monolithic document

<book version="5.1"
  xmlns="http://docbook.org/ns/docbook">
  <chapter version="5.1" xml:id="start">
    <title>Start</title>
    <para>See <xref linkend="intro" />.</para>
  </chapter>
  <chapter xml:id="intro" >
    <title>Introduction</title>
    <para>Basic stuff.</para>
  </chapter>
</book>

An internal link.

Internal link target.

Decomposing documents

master.xml

<book version="5.1" 
  xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"> 
  <xi:include href="start.xml" 
     xpointer="element(/1)"/> 

  <xi:include href="intro.xml" 
     xpointer="element(/1)"/> 
</book>

start.xml

<chapter version="5.1" 
xmlns="http://docbook.org/ns/docbook">
  <title>Start</title>
  <para>See
     <xref linkend="intro"/>.</para>
</chapter>

intro.xml

<chapter version="5.1" 
xmlns="http://docbook.org/ns/docbook">
<title>Introduction</title>
  <para>Basic stuff.</para>
</chapter>

Followup exercise

No. 4: Internal links and modular documents

XML grammar defining languages

  1. REgular LAnguage for XML Next Generation (RelaxNG)

  2. Schematron

  3. XML Schema (XSD)

  4. Document Type Definition (DTD)

Address list schema

Schema Doc instance
<element name="aBook">
  <zeroOrMore>
    <element name="person">
      <element name="fullName">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>
<aBook>
  <person>
    <fullName>Jim Bone</fullName>
    <email>bone@mycity.com</email>
  </person>
</aBook>

Followup exercise

No. 5: Inventing a <book> grammar

Format conversion problem

Problem regarding Figure 931, “Single source publishing”:

<book version="5.1" ...>
  ...
  <chapter>
    <title>Introduction</title>
    <para>First section.</para>
  </chapter> ...
</book>
<html>
  <head>...</head>
  <body>
     <h1>Introduction</h1>
     <p>First section.</p> ...
  </body>
</html>

XSL template rules

<xsl:template match="/book">
  <html>
    <head> ... </head>
    <body>
      <h1>
        <xsl:value-of select="title"/>
      </h1>
    </body>
  </html>
</xsl:template>

Example: Formatting <title> elements

<xsl:template match="title">
  <h1>
    <xsl:value-of select="."/>
  </h1>
</xsl:template>
<title>Some content</title>

gets converted to:

<h1>Some content</h1>

Followup exercises

Basic FO introduction

Followup exercises