The principle of a SAX application

We are already familiar with transformations of XML document instances to other formats. Sometimes the capabilities being offered by a given transformation approach do not suffice for a given problem. Obviously a general purpose programming language like Java offers superior means to perform advanced manipulations of XML document trees.

Before diving into technical details we present an example exceeding the limits of our present transformation capabilities. We want to format an XML catalog document with article descriptions to HTML. The price information however shall resides in a XML document external database namely a RDBMS:

Figure 895. Generating HTML from a XML document and an RDBMS. Create comment in forum

Our catalog might look like:

Figure 896. A XML based catalog. Create comment in forum
<catalog>
  <item orderNo="3218">Swinging headset</item>
  <item orderNo="9921">200W Stereo Amplifier</item>
</catalog>

The RDBMS may hold some relation with a field orderNo as primary key and a corresponding attribute like price. In a real world application orderNo should probably be an integer typed IDENTITY attribute.

Figure 897. A Relation containing price information. Create comment in forum
CREATE TABLE Product (
  orderNo CHAR(10) PRIMARY KEY
 ,price Money
)

INSERT INTO Product VALUES('3218', 42.57)
INSERT INTO Product VALUES('9921', 121.50)

Prices are depending on article numbers.


The intended HTML output with order numbers being highlighted looks like:

Figure 898. HTML generated output. Create comment in forum
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
        <html>
          <head><title>Available products</title></head>
          <body>
            <table border="1">
              <tbody>
                <tr>
                  <th>Order number</th>
                  <th>Price</th>
                  <th>Product</th>
                </tr>
                <tr>
                  <td>3218</td>
                  <td>42,57</td>
                  <td>Swinging headset</td>
                </tr>
                <tr>
                  <td>9921</td>
                  <td>121,50</td>
                  <td>200W Stereo Amplifier</td>
                </tr>
              </tbody>
            </table>
          </body>
        </html>

This result HTML document contains content both from our XML document an from the database table Product.


The intended transformation is beyond the XSLT standard's processing capabilities: XSLT does not enable us to RDBMS content. However some XSLT processors provide extensions for this task.

It is tempting to write a Java application which might use e.g. JDBC™ for database access. But how do we actually read and parse a XML file? Sticking to the Java standard we might use a FileInputStream instance to read from catalog.xml and write a XML parser by ourself. Fortunately SUN's JDK™ already includes an API denoted SAX, the Simple Api for Xml. TheJDK also includes a corresponding parser implementation. In addition there are third party SAX parser implementations available like Xerces from the Apache Foundation.

The SAX API is event based and will be illustrated by the relationship between customers and a software vendor company:

After purchasing software customers are asked to register their software. This way the vendor receives the customer's address. Each time a new release is being completed all registered customers will receive a notification typically including a special offer to upgrade their software. From an abstract point of view the following two actions take place:

Registration

The customer registers itself at the company's site indicating it's interest in updated versions.

Notification

Upon completion of each new software release (considered to be an event) a message is sent to all registered customers.

The same principle applies to GUI applications in software development. A key press event for example will be forwarded by an application's event handler to a callback function (sometimes called a handler method) being implemented by an application developer. The SAX API works the same way: A parser reads a XML document generating events which may be handled by an application. During document parsing the XML tree structure gets flattened to a sequence of events:

Figure 899. Parsing a XML document creates a corresponding sequence of events. Create comment in forum
Parsing a XML document creates a corresponding sequence of events.

An application may register components to the parser:

Figure 900. SAX Principle Create comment in forum
SAX Principle

A SAX application consists of a SAX parser and an implementation of event handlers being specific to the application. The application is developed by implementing the two handlers.


An Error Handler is required since the XML stream may contain errors. In order to implement a SAX application we have to:

  1. Instantiate required objects:

    • Parser

    • Event Handler

    • Error Handler

  2. Register handler instances

    • register Event Handler to Parser

    • register Error Handler to Parser

  3. Start the parsing process by calling the parser's appropriate method.