• The Document Object Model (DOM)
  • Functional programming basics in Java.

  • Dependency management using Maven.

Streaming / Event-based In-memory tree
SAX: XML to events
SAX architecture
layered SVG image
layered SVG image
layered SVG image
layered SVG image
layered SVG image
layered SVG image
layered SVG image
layered SVG image
layered SVG image
layered SVG image
layered SVG image
  • The Document Object Model (DOM)
    • ➟ Language independent specification
  • Event based model lacking context. Requires writing of content assembly related code.

  • No XPath support.

  • No subtree movement within or between documents.

  • In a word: No in-memory document representation.

    Consequence: No tree navigation.

  • DOM objects and operations being defined using CORBA 2.2 Interface Definition Language (IDL)

  • Per-language binding, e.g. a set of interfaces. Examples:

    • A set of Java interfaces.

    • A set of C++ pure virtual classes.

interface Node {
  const unsigned short ELEMENT_NODE   = 1; // NodeType
  const unsigned short ATTRIBUTE_NODE = 2;
  const unsigned short TEXT_NODE      = 3;
   ...
  readonly attribute DOMString      nodeName;
  attribute DOMString nodeValue;

  readonly attribute unsigned short nodeType;
  readonly attribute Node           parentNode;
   ...
  readonly attribute NodeList       childNodes;
  readonly attribute Node           firstChild;
   ...
  Node insertBefore(in Node newChild, in Node refChild)
                                  raises(DOMException);
   ...
  • Using a given language's constructs closely resembling the CORBA 2.2 IDL specification.

  • Difficult for non-OO languages.

package org.w3c.dom;

public interface Node {            // Node Types
   public static final short ELEMENT_NODE   = 1;
   public static final short ATTRIBUTE_NODE = 2;
   public static final short TEXT_NODE      = 3;
      ...
   public String   getNodeName();
   public String   getNodeValue() throws DOMException;
   public void     setNodeValue(String nodeValue) throws DOMException;
   public short    getNodeType();
   public Node     getParentNode();
   public NodeList getChildNodes();
   public Node     getFirstChild();
   ...
   public Node     insertBefore(Node newChild, Node refChild)
                                          throws DOMException;
   ...
layered SVG image
layered SVG image
layered SVG image
layered SVG image
  • Element

  • Text

  • Comment

  • Processing instruction: <?xml-stylesheet type="text/xsl" href="style.xsl"?>.

  • Entity

  • ...

  binding
      inheritance interface hierarchy
 modules.
  • Set apart from the standard.

  • May lack advanced features.

  • Smaller user community, less mature.

  • Potential 3-rd party DOM framework incompatibilities.

  • The Document Object Model (DOM)
    • ➟ Creating a new Document instance from scratch
<dependency>
  <groupId>org.jdom</groupId>
  <artifactId>jdom2</artifactId>
  <version>2.0.6</version>
</dependency>
 ...

Create an empty Element instance to become the document's root.

Add a Text node.

Set a new attribute date to value 23.02.2000.

Create a serializer instance of XMLOutputter providing output prettifying.

Serialize the result tree to a stream.

final Element titel = new Element("titel"); 

titel.addContent(new Text("First try")); 

titel.setAttribute("date", "23.02.2000"); 

final XMLOutputter printer =
      new XMLOutputter(Format.getPrettyFormat());

printer.output(titel, System.out); 
Result: <titel  date="23.02.2000">First try</titel>
A sub structured <title>
  • The Document Object Model (DOM)
    • ➟ Parsing existing XML documents
<catalog>
  <item orderNo="3218">Swinging headset</item>
  <item orderNo="9921">200W Stereo Amplifier</item>
</catalog>
public class MySaxErrorHandler implements ErrorHandler {

   private PrintStream out; //The error handler's output goes here

   private String getParseExceptionInfo (SAXParseException ex) {
     return "Error '" + ex.getMessage() + "' at line " + ex.getLineNumber() +
                  ", column " + ex.getColumnNumber();
   }
   public MySaxErrorHandler(final PrintStream out ) {this.out = out;}
   @Override public void warning (SAXParseException exception ) throws SAXException {
      out.print("Warning:" + getParseExceptionInfo(exception));
   }
   @Override public void error (SAXParseException exception ) throws SAXException {
      out.print("Error:" + getParseExceptionInfo(exception));
   } @Override
   public void fatalError (SAXParseException exception ) throws SAXException {
      out.print("Fatal error:" + getParseExceptionInfo(exception));
   }
}
public class ReadCatalog {
   private SAXBuilder builder = new SAXBuilder(); 

   public ReadCatalog() {
      builder.setErrorHandler(new MySaxErrorHandler(System.out)); 
   }
   public void process(final String filename)  throws JDOMException , IOException {
      final Document docInput = builder.build(
            getClass().getClassLoader().getResource(filename) 
      );
      final Element docRoot = docInput.getRootElement(); 
      docRoot.getChildren().forEach(item -> 
      System.out.println(
            "Article: " + item.getText() +
            ", order number: " + item.getAttributeValue("orderNo")));
   }
}
public class ReadCatalogDriver {

  public static void main(String[] argv) throws Exception {
    final ReadCatalog catalogReader = new ReadCatalog();
    catalogReader.process("catalog.xml");
  }
}
Article: Swinging headset, order number: 3218
Article: 200W Stereo Amplifier, order number: 9921

https://gitlab.mi.hdm-stuttgart.de/goik/GoikLectures/tree/master/P/Sda1/Jdom/Catalog

  1. Visualizing XML document elements
  2. Reminder to functional programming elements in Java.
  3. Creating HTML output
  4. Cleaning up HTML.
  • The Document Object Model (DOM)
    • ➟ Using DOM with HTML/Javascript
  • Widespread Javascript support.

  • Full DOM support.

function sortables_init() {
  if (!document.getElementsByTagName) return;
  tbls = document.getElementsByTagName("table");
  for (ti=0;ti<tbls.length;ti++) {
    thisTbl = tbls[ti];
    if (((' '+thisTbl.className+' ').indexOf("sortable") != -1)
      && (thisTbl.id)) {
      ts_makeSortable(thisTbl);
    }
  }}
  • Addressing node sets in XML trees.

  • Conceptional SQL similarity.

  • Collections representing result sets.

<dependency>                  <!-- Jdom itself -->
  <groupId>org.jdom</groupId>
  <artifactId>jdom2</artifactId>
  <version>2.0.6</version>
</dependency>

<dependency>                  <!-- XPath support for Jdom -->
  <groupId>jaxen</groupId>
  <artifactId>jaxen</artifactId>
  <version>1.1.6</version>
</dependency> ...
<html xmlns="http://www.w3.org/1999/xhtml">
  <head><title>Picture gallery</title></head>
  <body>
    <h1>Picture gallery</h1>
    <p>Images may appear inline:<img src="inline.gif" alt="none"/></p>
    <table><tbody>
      <tr>
        <td>Number one:</td>
        <td><img src="one.gif" alt="none"/></td>
      </tr>
      <tr>
        <td>Number two:</td>
        <td><img src="http://www.hdm-stuttgart.de/favicon.ico" alt="none"/></td>
      </tr>
    </tbody></table>
  </body>
</html>
  • (Nearly) arbitrary positions.

  • Possibly additional search restrictions e.g.: searching for <img/> elements missing an alt attribute.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:html="http://www.w3.org/1999/xhtml">
  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:for-each select="//html:img">
      <xsl:value-of select="@src"/>
      <xsl:text> </xsl:text>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

Result acting on Figure 809, “HTML containing <img> tags.”:

inline.gif one.gif two.gif
public class DomXpath {
  private final SAXBuilder builder = new SAXBuilder();

  public List<Element> process(final String xhtmlFilename)
                throws JDOMException, IOException {

    final Document htmlInput = builder.build(xhtmlFilename);
     ...
   }
}

Tip

Complete code available here.

static final XPathExpression<Element> xpathSearchImg =
  XPathFactory.instance().compile(
    "//img" ,
    new ElementFilter() /* filter just elements */);
static final Namespace htmlNamespace  =
  Namespace.getNamespace("html", "http://www.w3.org/1999/xhtml");

static final XPathExpression<Element> xpathSearchImg =
  XPathFactory.instance().compile(
    "//html:img" ,
    new ElementFilter(),
    null ,
    htmlNamespace );
public List<Element> process(final String xhtmlFilename)... {
  final Document htmlInput = builder.build(xhtmlFilename);
    return xpathSearchImg.evaluate(htmlInput);
}
new DomXpath().process("src/main/resources/gallery.html").
      stream().
      map(img -> img.getAttributeValue("src")).
      reduce((l, r) -> l.concat(", ").concat(r)).
      ifPresent(System.out::println);
inline.gif, one.gif, http://www.hdm-stuttgart.de/favicon.ico
Verification of referenced images readability
Map<String, Object> xpathVarsNamespacePrefix = new HashMap<>();
xpathVarsNamespacePrefix.put("cssClass", null) ;
...
XPathExpression<Element> searchCssClass = XPathFactory.instance().compile(
  "//html:*[@class = $cssClass]",
  new ElementFilter(), xpathVarsNamespacePrefix, htmlNamespace);

searchCssClass.setVariable("cssClass", "header");
searchCssClass.evaluate(htmlInput) ...

// Reuse by changing $cssClass
searchCssClass.setVariable("cssClass", "footer");
searchCssClass.evaluate(htmlInput) ...
HTML internal reference verification
  • The Document Object Model (DOM)
<catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="catalog.xsd">
  <title>Outdoor products</title>
  <introduction>
    <para>We offer a great variety of basic stuff for mountaineering
          such as ropes, harnesses and tents.</para>
    <para>Our shop is proud for its large number of available
      sleeping bags.</para>
  </introduction>
  <product id="x-223">
    <title>Multi freezing bag  Nightmare camper</title>
    <description>
      <para>You will feel comfortable till  minus 20 degrees - At
            least if you are a penguin or a polar bear.</para>
    </description>
  </product>
  <product id="r-334">
    <title>Rope 40m</title>
    <description>
      <para>Excellent for indoor climbing.</para>
    </description>
  </product>
</catalog>

A corresponding schema file catalog.xsd is straightforward:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
   xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
   vc:minVersion="1.0" vc:maxVersion="1.1">

   <xs:simpleType name="money">
      <xs:restriction base="xs:decimal">
         <xs:fractionDigits value="2"/>
      </xs:restriction>
   </xs:simpleType>

   <xs:element name="title" type="xs:string"/>
   <xs:element name="para" type="xs:string"/>

   <xs:element name="description" type="paraSequence"/>
   <xs:element name="introduction" type="paraSequence"/>

   <xs:complexType name="paraSequence">
      <xs:sequence>
         <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/>
      </xs:sequence>
   </xs:complexType>

   <xs:element name="product">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="title"/>
            <xs:element ref="description"/>
         </xs:sequence>
         <xs:attribute name="id" type="xs:ID" use="required"/>
         <xs:attribute name="price" type="money" use="optional"/>
      </xs:complexType>
   </xs:element>

   <xs:element name="catalog">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="title"/>
            <xs:element ref="introduction"/>
            <xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/>
         </xs:sequence>
      </xs:complexType>
   </xs:element>

</xs:schema>
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0" xmlns="http://www.w3.org/1999/xhtml">

  <xsl:template match="/catalog">
    <html>
      <head><title><xsl:value-of select="title"/></title></head>
      <body style="background-color:#FFFFFF">
        <h1><xsl:value-of select="title"/></h1>
        <xsl:apply-templates select="product"/>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="product">
    <h3><xsl:value-of select="title"/></h3>
    <xsl:for-each select="description/para">
      <p><xsl:value-of select="."/></p>
    </xsl:for-each>
    <xsl:if test="price">
      <p>
        <xsl:text>Price:</xsl:text>
        <xsl:value-of select="price/@value"/>
      </p>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>
package dom.xsl;
...
public class Xml2Html {
   private final SAXBuilder builder = new SAXBuilder();

   final XSLTransformer transformer;

  public Xml2Html(final String xslFilename) throws XSLTransformException {
     builder.setErrorHandler(new MySaxErrorHandler(System.err));
     transformer =  new XSLTransformer(xslFilename);
  }
  public void transform(final String xmlInFilename,
      final String resultFilename) throws JDOMException, IOException {

    final Document inDoc = builder.build(xmlInFilename);
    Document result = transformer.transform(inDoc);

    // Set formatting for the XML output
    final Format outFormat = Format.getPrettyFormat();

    // Serialize to console
    final XMLOutputter printer = new XMLOutputter(outFormat);
    printer.output(result.getDocument(), System.out);

  }
}
package dom.xsl;
...
public class Xml2HtmlDriver {
...
  public static void main(String[] args) {
    final String
     inFilename = "Input/Dom/climbing.xml",
     xslFilename = "Input/Dom/catalog2html.xsl",
     htmlOutputFilename = "Input/Dom/climbing.html";
    try {
      final Xml2Html converter = new Xml2Html(xslFilename);
      converter.transform(inFilename, htmlOutputFilename);
    } catch (Exception e) {
      System.err.println("The conversion of '" + inFilename
          + "' by stylesheet '" + xslFilename
          + "' to output HTML file '" + htmlOutputFilename
          + "' failed with the following error:" + e);
      e.printStackTrace();
    }
  }
}
Namespace / elements statistics