Overview

  • The Document Object Model (DOM)

    Lecture notes

Required knowledge

  • Functional programming basics in Java.

  • Dependency management using Maven.

Lecture notes

  • The Document Object Model (DOM)

Create comment

Important XML Java APIs

Streaming / Event-based In-memory tree

Lecture notes

  • The Document Object Model (DOM)

Create comment

SAX: XML to events

SAX: XML to events

Lecture notes

  • The Document Object Model (DOM)

Create comment

SAX architecture

SAX architecture

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

DOM assembly

layered SVG image

Lecture notes

  • The Document Object Model (DOM)

Create comment

Overview

  • The Document Object Model (DOM)
    • ➟ Language independent specification

      Lecture notes

SAX deficiencies

  • Event based model lacking context. Requires writing of content assembly related code.

  • No XPath support.

  • No subtree movement within or between documents.

  • In a word: No in-memory document representation.

    Consequence: No tree navigation.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

DOM: Language independence

  • DOM objects and operations being defined using CORBA 2.2 Interface Definition Language (IDL)

  • Per-language binding, e.g. a set of interfaces. Examples:

    • A set of Java interfaces.

    • A set of C++ pure virtual classes.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

DOM: Vendor independence

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

DOM Node CORBA 2.2 IDL

interface Node {
  const unsigned short ELEMENT_NODE   = 1; // NodeType
  const unsigned short ATTRIBUTE_NODE = 2;
  const unsigned short TEXT_NODE      = 3;
   ...
  readonly attribute DOMString      nodeName;
  attribute DOMString nodeValue;

  readonly attribute unsigned short nodeType;
  readonly attribute Node           parentNode;
   ...
  readonly attribute NodeList       childNodes;
  readonly attribute Node           firstChild;
   ...
  Node insertBefore(in Node newChild, in Node refChild)
                                  raises(DOMException);
   ...

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

Defining a language binding

  • Using a given language's constructs closely resembling the CORBA 2.2 IDL specification.

  • Difficult for non-OO languages.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

org.w3c.dom.Node Java binding.

package org.w3c.dom;

public interface Node {            // Node Types
   public static final short ELEMENT_NODE   = 1;
   public static final short ATTRIBUTE_NODE = 2;
   public static final short TEXT_NODE      = 3;
      ...
   public String   getNodeName();
   public String   getNodeValue() throws DOMException;
   public void     setNodeValue(String nodeValue) throws DOMException;
   public short    getNodeType();
   public Node     getParentNode();
   public NodeList getChildNodes();
   public Node     getFirstChild();
   ...
   public Node     insertBefore(Node newChild, Node refChild)
                                          throws DOMException;
   ...

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

A context node's children

layered SVG image

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

A context node's children

layered SVG image

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

A context node's children

layered SVG image

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

A context node's children

layered SVG image

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

org.w3c.dom.Node subtypes

  • Element

  • Text

  • Comment

  • Processing instruction: <?xml-stylesheet type="text/xsl" href="style.xsl"?>.

  • Entity

  • ...

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

DOM Java binding inheritance interface hierarchy

DOM Java™ binding inheritance interface hierarchy

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

DOM modules.

DOM modules.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

Jdom vs. DOM: Advantages

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

Jdom vs. DOM: Disadvantages

  • Set apart from the standard.

  • May lack advanced features.

  • Smaller user community, less mature.

  • Potential 3-rd party DOM framework incompatibilities.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Language independent specification

Create comment

Overview

  • The Document Object Model (DOM)
    • ➟ Creating a new Document instance from scratch

      Lecture notes

Prerequisite: pom.xml configuration

<dependency>
  <groupId>org.jdom</groupId>
  <artifactId>jdom2</artifactId>
  <version>2.0.6</version>
</dependency>
 ...

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Creating a new Document instance from scratch

Create comment

Exporting data as XML

Create an empty Element instance to become the document's root.

Add a Text node.

Set a new attribute date to value 23.02.2000.

Create a serializer instance of XMLOutputter providing output prettifying.

Serialize the result tree to a stream.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Creating a new Document instance from scratch

Create comment

XML document creation from scratch.

final Element titel = new Element("titel"); 

titel.addContent(new Text("First try")); 

titel.setAttribute("date", "23.02.2000"); 

final XMLOutputter printer =
      new XMLOutputter(Format.getPrettyFormat());

printer.output(titel, System.out); 
Result: <titel  date="23.02.2000">First try</titel>

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Creating a new Document instance from scratch

Create comment

Followup exercise

No. 14: A sub structured <title>

Overview

  • The Document Object Model (DOM)
    • ➟ Parsing existing XML documents

      Lecture notes

XML catalog sample data

<catalog>
  <item orderNo="3218">Swinging headset</item>
  <item orderNo="9921">200W Stereo Amplifier</item>
</catalog>

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Parsing existing XML documents

Create comment

SAX error handler

public class MySaxErrorHandler implements ErrorHandler {

   private PrintStream out; //The error handler's output goes here

   private String getParseExceptionInfo (SAXParseException ex) {
     return "Error '" + ex.getMessage() + "' at line " + ex.getLineNumber() +
                  ", column " + ex.getColumnNumber();
   }
   public MySaxErrorHandler(final PrintStream out ) {this.out = out;}
   @Override public void warning (SAXParseException exception ) throws SAXException {
      out.print("Warning:" + getParseExceptionInfo(exception));
   }
   @Override public void error (SAXParseException exception ) throws SAXException {
      out.print("Error:" + getParseExceptionInfo(exception));
   } @Override
   public void fatalError (SAXParseException exception ) throws SAXException {
      out.print("Fatal error:" + getParseExceptionInfo(exception));
   }
}

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Parsing existing XML documents

Create comment

Accessing an XML Tree purely by DOM methods.

public class ReadCatalog {
   private SAXBuilder builder = new SAXBuilder(); 

   public ReadCatalog() {
      builder.setErrorHandler(new MySaxErrorHandler(System.out)); 
   }
   public void process(final String filename)  throws JDOMException , IOException {
      final Document docInput = builder.build(
            getClass().getClassLoader().getResource(filename) 
      );
      final Element docRoot = docInput.getRootElement(); 
      docRoot.getChildren().forEach(item -> 
      System.out.println(
            "Article: " + item.getText() +
            ", order number: " + item.getAttributeValue("orderNo")));
   }
}

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Parsing existing XML documents

Create comment

Driver class execution entry point

public class ReadCatalogDriver {

  public static void main(String[] argv) throws Exception {
    final ReadCatalog catalogReader = new ReadCatalog();
    catalogReader.process("catalog.xml");
  }
}
Article: Swinging headset, order number: 3218
Article: 200W Stereo Amplifier, order number: 9921

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Parsing existing XML documents

Create comment

Project sample code for import

https://gitlab.mi.hdm-stuttgart.de/goik/GoikLectures/tree/master/P/Sda1/Jdom/Catalog

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Parsing existing XML documents

Create comment

Followup exercises

Overview

  • The Document Object Model (DOM)
    • ➟ Using with HTML/Javascript

      Lecture notes

DOM and Javascript

  • Widespread Javascript support.

  • Full DOM support.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Using with HTML/Javascript

Create comment

DOM Javascript example

function sortables_init() {
  if (!document.getElementsByTagName) return;
  tbls = document.getElementsByTagName("table");
  for (ti=0;ti<tbls.length;ti++) {
    thisTbl = tbls[ti];
    if (((' '+thisTbl.className+' ').indexOf("sortable") != -1)
      && (thisTbl.id)) {
      ts_makeSortable(thisTbl);
    }
  }}

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Using with HTML/Javascript

Create comment

DOM Javascript demo

Lecture notes

  • The Document Object Model (DOM)
    • ➟ Using with HTML/Javascript

Create comment

Overview

  • The Document Object Model (DOM)
    • ➟ and XPath

      Lecture notes

Why using XPath ?

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

XPath and Jdom

  • Addressing node sets in XML trees.

  • Conceptional SQL similarity.

  • Collections representing result sets.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

XPath on top of Jdom

<dependency>                  <!-- Jdom itself -->
  <groupId>org.jdom</groupId>
  <artifactId>jdom2</artifactId>
  <version>2.0.6</version>
</dependency>

<dependency>                  <!-- XPath support for Jdom -->
  <groupId>jaxen</groupId>
  <artifactId>jaxen</artifactId>
  <version>1.1.6</version>
</dependency> ...

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

HTML containing <img> tags.

<html xmlns="http://www.w3.org/1999/xhtml">
  <head><title>Picture gallery</title></head>
  <body>
    <h1>Picture gallery</h1>
    <p>Images may appear inline:<img src="inline.gif" alt="none"/></p>
    <table><tbody>
      <tr>
        <td>Number one:</td>
        <td><img src="one.gif" alt="none"/></td>
      </tr>
      <tr>
        <td>Number two:</td>
        <td><img src="http://www.hdm-stuttgart.de/favicon.ico" alt="none"/></td>
      </tr>
    </tbody></table>
  </body>
</html>

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

Objective: Find contained images

  • (Nearly) arbitrary positions.

  • Possibly additional search restrictions e.g.: searching for <img/> elements missing an alt attribute.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

XSL script extracting images.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:html="http://www.w3.org/1999/xhtml">
  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:for-each select="//html:img">
      <xsl:value-of select="@src"/>
      <xsl:text> </xsl:text>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

Result acting on Figure 794, “HTML containing <img> tags.”:

inline.gif one.gif two.gif

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

Setting up the parser

public class DomXpath {
  private final SAXBuilder builder = new SAXBuilder();

  public List<Element> process(final String xhtmlFilename)
                throws JDOMException, IOException {

    final Document htmlInput = builder.build(xhtmlFilename);
     ...
   }
}

Tip

Complete code available here.

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

Search using XPath //img

static final XPathExpression<Element> xpathSearchImg =
  XPathFactory.instance().compile(
    "//img" ,
    new ElementFilter() /* filter just elements */);

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

Search and namespace

static final Namespace htmlNamespace  =
  Namespace.getNamespace("html", "http://www.w3.org/1999/xhtml");

static final XPathExpression<Element> xpathSearchImg =
  XPathFactory.instance().compile(
    "//html:img" ,
    new ElementFilter(),
    null ,
    htmlNamespace );

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

Searching for images

public List<Element> process(final String xhtmlFilename)... {
  final Document htmlInput = builder.build(xhtmlFilename);
    return xpathSearchImg.evaluate(htmlInput);
}
new DomXpath().process("src/main/resources/gallery.html").
      stream().
      map(img -> img.getAttributeValue("src")).
      reduce((l, r) -> l.concat(", ").concat(r)).
      ifPresent(System.out::println);
inline.gif, one.gif, http://www.hdm-stuttgart.de/favicon.ico

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

Followup exercise

No. 19: Verification of referenced images readability

Parameterized search expressions

Map<String, Object> xpathVarsNamespacePrefix = new HashMap<>();
xpathVarsNamespacePrefix.put("cssClass", null) ;
...
XPathExpression<Element> searchCssClass = XPathFactory.instance().compile(
  "//html:*[@class = $cssClass]",
  new ElementFilter(), xpathVarsNamespacePrefix, htmlNamespace);

searchCssClass.setVariable("cssClass", "header");
searchCssClass.evaluate(htmlInput) ...

// Reuse by changing $cssClass
searchCssClass.setVariable("cssClass", "footer");
searchCssClass.evaluate(htmlInput) ...

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XPath

Create comment

Followup exercise

No. 20: HTML internal reference verification

Overview

  • The Document Object Model (DOM)
    • ➟ and XSL

      Lecture notes

A simplified XML product catalog

<catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="catalog.xsd">
  <title>Outdoor products</title>
  <introduction>
    <para>We offer a great variety of basic stuff for mountaineering
          such as ropes, harnesses and tents.</para>
    <para>Our shop is proud for its large number of available
      sleeping bags.</para>
  </introduction>
  <product id="x-223">
    <title>Multi freezing bag  Nightmare camper</title>
    <description>
      <para>You will feel comfortable till  minus 20 degrees - At
            least if you are a penguin or a polar bear.</para>
    </description>
  </product>
  <product id="r-334">
    <title>Rope 40m</title>
    <description>
      <para>Excellent for indoor climbing.</para>
    </description>
  </product>
</catalog>

A corresponding schema file catalog.xsd is straightforward:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
   xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning" elementFormDefault="qualified"
   vc:minVersion="1.0" vc:maxVersion="1.1">

   <xs:simpleType name="money">
      <xs:restriction base="xs:decimal">
         <xs:fractionDigits value="2"/>
      </xs:restriction>
   </xs:simpleType>

   <xs:element name="title" type="xs:string"/>
   <xs:element name="para" type="xs:string"/>

   <xs:element name="description" type="paraSequence"/>
   <xs:element name="introduction" type="paraSequence"/>

   <xs:complexType name="paraSequence">
      <xs:sequence>
         <xs:element ref="para" minOccurs="1" maxOccurs="unbounded"/>
      </xs:sequence>
   </xs:complexType>

   <xs:element name="product">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="title"/>
            <xs:element ref="description"/>
         </xs:sequence>
         <xs:attribute name="id" type="xs:ID" use="required"/>
         <xs:attribute name="price" type="money" use="optional"/>
      </xs:complexType>
   </xs:element>

   <xs:element name="catalog">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="title"/>
            <xs:element ref="introduction"/>
            <xs:element ref="product" minOccurs="1" maxOccurs="unbounded"/>
         </xs:sequence>
      </xs:complexType>
   </xs:element>

</xs:schema>

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XSL

Create comment

A XSL style sheet for catalog transformation to HTML.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0" xmlns="http://www.w3.org/1999/xhtml">

  <xsl:template match="/catalog">
    <html>
      <head><title><xsl:value-of select="title"/></title></head>
      <body style="background-color:#FFFFFF">
        <h1><xsl:value-of select="title"/></h1>
        <xsl:apply-templates select="product"/>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="product">
    <h3><xsl:value-of select="title"/></h3>
    <xsl:for-each select="description/para">
      <p><xsl:value-of select="."/></p>
    </xsl:for-each>
    <xsl:if test="price">
      <p>
        <xsl:text>Price:</xsl:text>
        <xsl:value-of select="price/@value"/>
      </p>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XSL

Create comment

Transforming an XML document instance to HTML by a XSL style sheet.

package dom.xsl;
...
public class Xml2Html {
   private final SAXBuilder builder = new SAXBuilder();

   final XSLTransformer transformer;

  public Xml2Html(final String xslFilename) throws XSLTransformException {
     builder.setErrorHandler(new MySaxErrorHandler(System.err));
     transformer =  new XSLTransformer(xslFilename);
  }
  public void transform(final String xmlInFilename,
      final String resultFilename) throws JDOMException, IOException {

    final Document inDoc = builder.build(xmlInFilename);
    Document result = transformer.transform(inDoc);

    // Set formatting for the XML output
    final Format outFormat = Format.getPrettyFormat();

    // Serialize to console
    final XMLOutputter printer = new XMLOutputter(outFormat);
    printer.output(result.getDocument(), System.out);

  }
}

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XSL

Create comment

A driver class for the xml2xml transformer.

package dom.xsl;
...
public class Xml2HtmlDriver {
...
  public static void main(String[] args) {
    final String
     inFilename = "Input/Dom/climbing.xml",
     xslFilename = "Input/Dom/catalog2html.xsl",
     htmlOutputFilename = "Input/Dom/climbing.html";
    try {
      final Xml2Html converter = new Xml2Html(xslFilename);
      converter.transform(inFilename, htmlOutputFilename);
    } catch (Exception e) {
      System.err.println("The conversion of '" + inFilename
          + "' by stylesheet '" + xslFilename
          + "' to output HTML file '" + htmlOutputFilename
          + "' failed with the following error:" + e);
      e.printStackTrace();
    }
  }
}

Lecture notes

  • The Document Object Model (DOM)
    • ➟ and XSL

Create comment

Followup exercise

No. 21: Namespace / elements statistics