Event- and error handler registration

Our first SAX application suffers from the following deficiencies:

  • The error handling is very sparse. It completely relies on exceptions being thrown by classes like SAXException which frequently do not supply meaningful error information.

  • The application is not aware of namespaces. Thus reading e.g. XSL document instances will not allow to distinguish between elements from different namespaces like HTML.

  • The parser will not validate a document instance against a schema being present.

We now incrementally add these features to the SAX parsing process. SAX offers an interface XmlReader to conveniently register event- and error handler instances independently instead of passing both interfaces as a single argument to the parse() method. We first code an error handler class by implementing the interface org.xml.sax.ErrorHandler being part of the SAX API:

package sax.stat.v2;
...
public class MyErrorHandler implements ErrorHandler {

  public void warning(SAXParseException e) {
    System.err.println("[Warning]" + getLocationString(e));
  }
  public void error(SAXParseException e) {
    System.err.println("[Error]" + getLocationString(e));
  }
  public void fatalError(SAXParseException e) throws SAXException{
    System.err.println("[Fatal Error]" + getLocationString(e));
  }
  private String getLocationString(SAXParseException e) {
    return " line " + e.getLineNumber() +
    ", column " + e.getColumnNumber()+ ":" +  e.getMessage();
  }
}

These three methods represent the org.xml.sax.ErrorHandler interface. The method getLocationString is used to supply precise parsing error locations by means of line- and column numbers within a document instance. If errors or warnings are encountered the parser will call one of the appropriate public methods:

Figure 906. A non well formed document. Create comment in forum
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
  <item orderNo="3218">Swinging headset</item>
  <item orderNo="9921">200W Stereo Amplifier
</catalog>

This document is not well formed since due to a missing a closing </item> tag is missing.


Our error handler method gets called yielding an informative message:

[Fatal Error] line 5, column -1:Expected "</item>" to terminate
element starting on line 4.

This error output is achieved by registering an instance of sax.stat.v2.MyErrorHandler to the parser prior to starting the parsing process. In the following code snippet we also register a content handler instance to the parser and thus separate the parser's configuration from its invocation:

package sax.stat.v2;
...
public class ElementCount {
  public ElementCount()
   throws SAXException, ParserConfigurationException{
      final SAXParserFactory saxPf = SAXParserFactory.newInstance();
      final SAXParser saxParser = saxPf.newSAXParser();
      xmlReader = saxParser.getXMLReader();
      xmlReader.setContentHandler(eventHandler); ❶
      xmlReader.setErrorHandler(errorHandler); ❷
  }
  public void parse(final String uri)
    throws IOException, SAXException{
    xmlReader.parse(uri); ❸
  }
  public int getElementCount() {
    return eventHandler.getElementCount(); ❹
  }
  private final XMLReader xmlReader;
  private final MyEventHandler eventHandler = new MyEventHandler(); private final MyErrorHandler errorHandler = new MyErrorHandler(); ❻
}

Referring to Figure 904, “SAX Principle ” these two calls attach the event- and error handler objects to the parser thus implementing the two arrows from the parser to the application's implementation.

The parser is invoked. Note that in this example we only pass a document's URI but no reference to a handler object.

The method getElementCount() is needed to allow a calling object to access the private eventHandler object's getElementCount() method.

An event handling and an error handling object are created to handle events during the parsing process.

The careful reader might notice a subtle difference between the content- and the error handler implementation: The class sax.stat.v2.MyErrorHandler implements the interface org.xml.sax.ErrorHandler. But sax.stat.v2.MyEventHandler is derived from org.xml.sax.helpers.DefaultHandler which itself implements the org.xml.sax.ContentHandler interface. Actually one might as well start from the latter interface requiring to implement all of it's 11 methods. In most circumstances this only complicates the application's code since it is unnecessary to react to events belonging for example to processing instructions. For this reason it is good coding practice to use the empty default implementations in org.xml.sax.helpers.DefaultHandler and to redefine only those methods corresponding to events actually being handled by the application in question.

exercise No. 60

SAX and attribute values Create comment in forum

Reading an element's set of attributes.

The example document instance does include orderNo attribute values for each <item> element. The parser does not yet show these attribute keys and their corresponding values. Read the documentation for org.xml.sax.Attributes and extend the given code to use it.

You should start from the MIB Maven archetype mi-maven-archetype-sax. Configuration hints are available at the section called “Intellij IDEA on top of Maven”.

A:

For the given example it would suffice to read the known orderNo attributes value. A generic solution may ask for the set of all defined attributes and show their values:

package sax;

public class AttribEventHandler extends DefaultHandler {

  public void startElement(String namespaceUri, String localName,
      String rawName, Attributes attrs) {
    System.out.println("Opening Element " + rawName);
    for (int i = 0; i < attrs.getLength(); i++){
      System.out.println(attrs.getQName(i) + "=" + attrs.getValue(i) + "\n");
    }
  }
}