The Rule-Based Design Pattern
An
alternative way of structuring a SAX application, which again has the objective
of separating functions and keeping the structure modular and
simple, is a rule-based approach.
In
general rule-based programs use an "Event-Condition-Action" model: they contain
a collection of rules of the form "if this event occurs under these conditions,
perform this action". Rule based programming can thus be seen as a natural
extension of event-based programming.
The processing model of XSL (discussed in Chapter 9) can be
seen as an example of rule-based programming. Each XSL template constitutes one
rule: the event is the processing of a node in
the source document, the condition is the pattern that controls which template
is activated, and the action is the body of the template. We can use the same concepts in a SAX
application.
The diagram below illustrates the structure of a rule-based
SAX application. The input from the XML parser is fed into a switch, which
evaluates the events against the defined conditions, and decides which actions
to invoke. The actions are then passed to processing modules each of which is
designed to perform one specific task.
There are all sorts of ways conditions and actions could be
implemented, but we'll describe a very simple implementation, where the
condition is based only on element type.
Firstly, let's write the DocumentHandler. We'll call it
Switcher because its job is to switch processing to a
piece of code that handles the specific element type.
What Switcher does is to maintain a set of rules as a Hashtable. The set of rules is indexed by element type.
The application can nominate a class called an ElementHandler to process a particular element type. When the parser
notifies an element start tag, the appropriate ElementHandler is located in the set of rules, and it is called to process the start tag. At the same time, the ElementHandler
is remembered on a stack, so that the same ElementHandler can be used to process the
end tag and any character data occurring immediately within this
element.
Here's the Switcher code:
import
org.xml.sax.*;
import java.util.*;
/**
* Switcher is a
DocumentHandler that directs events to an appropriate element
* handler based on
the element type.
*/
public class Switcher
extends HandlerBase
{
private
Hashtable rules = new Hashtable();
private
Stack stack = new Stack();
/**
*
Define processing for an element type.
*/
public
void setElementHandler(String name, ElementHandler handler)
{
rules.put(name, handler);
}
/**
* Start
of an element. Decide what handler to use, and call it.
*/
public
void startElement (String name, AttributeList atts) throws
SAXException
{
ElementHandler handler = (ElementHandler)rules.get(name);
stack.push(handler);
if
(handler!=null)
{
handler.startElement(name, atts);
}
}
/**
* End
of an element.
*/
public
void endElement (String name) throws SAXException
{
ElementHandler handler = (ElementHandler)stack.pop();
if
(handler!=null)
{
handler.endElement(name);
}
}
/**
*
Character data.
*/
public
void characters (char[] ch, int start, int length) throws SAXException
{
ElementHandler handler = (ElementHandler)stack.peek();
if
(handler!=null)
{
handler.characters(ch, start, length);
}
}
}
An
ElementHandler is rather like a DocumentHandler, but it only ever gets to
process a subset of the events: element start and end, and character data. So
although we could use a DocumentHandler here, we've defined a special class.
This serves both as a definition of the interface and as a superclass for real
element handlers: good Java coding practice might suggest using a separate
interface class, but this will do for now:
import
org.xml.sax.*;
/**
* ElementHandler is
a class that process the start and end tags and
* character
data
* for one element
type. This class itself does nothing; the
* real processing
should
* be defined in a
subclass
*/
public class
ElementHandler {
/**
* Start
of an element
*/
public
void startElement (String name, AttributeList atts) throws
SAXException {}
/**
* End
of an element
*/
public
void endElement (String name) throws SAXException {}
/**
*
Character data
*/
public
void characters (char[] ch, int start, int length) throws
SAXException {}
}
So
far this is all completely general. We could use the Switcher and ElementHandler classes with any kind of document, to do any kind of
processing. Now let's exploit them for a real application: we want to produce an
HTML page showing selected data from our list of books.
Here's an application that does it. We'll start with the
main control structure. What this does is to create a Switcher and register a number of ElementHandler
classes to process
particular elements in the input XML document. It then creates a Parser, nominates Switcher as the DocumentHandler, and runs the parse:
import org.xml.sax.*;
import
com.icl.saxon.ParserManager;
public class
DisplayBookList
{
public static
void main (String args[]) throws Exception
{
(new DisplayBookList()).go(args[0]);
}
public void
go(String input) throws Exception
{
Switcher s = new Switcher();
s.setElementHandler("books", new BooklistHandler());
s.setElementHandler("book", new BookHandler());
s.setElementHandler("author", new AuthorHandler());
s.setElementHandler("title", new TitleHandler());
s.setElementHandler("price", new PriceHandler());
s.setElementHandler("volume", new VolumeHandler());
Parser p = ParserManager.makeParser();
p.setDocumentHandler(s);
p.parse(input);
}
//...rest of code goes
in here...
}
The actual element handlers can be defined as inner classes
within the DisplayBookList class: this is useful because it enables them to share
access to data.
The ElementHandler for the outermost element, "books",
causes a skeletal HTML page to be created:
private class
BooklistHandler extends ElementHandler
{
public void startElement(String name, AttributeList atts)
{
System.out.println("<html>");
System.out.println("<head><title>Book
List</title></head>");
System.out.println("<body><h1>A List of
Books</h1>");
System.out.println("<table>");
System.out.println("<tr><th>Author</th>");
System.out.println("<th>Title</th><th>Price</th></tr>");
}
public void endElement(String name)
{
System.out.println("</table></body></html>");
}
}
The ElementHandler for the repeated "book" element starts
and ends a row in the generated HTML table, and initializes some variables to
hold the data:
private
String author;
private
String title;
private
String price;
private
boolean inVolume;
private class
BookHandler extends ElementHandler
{
public void startElement(String name, AttributeList atts)
{
author = "";
title = "";
price = "";
inVolume = false;
}
public void endElement(String name)
{
System.out.println("<tr><td>" + author + "</td>");
System.out.println("<td>" + title + "</td>");
System.out.println("<td>" + price + "</td></tr>");
}
}
Finally, the element handlers for the fields within the
<book> element update the local variables holding the data. We're
being careless about performance here in the interests of clarity – it would be
better to use StringBuffers rather than Strings for the variables.
private class
AuthorHandler extends ElementHandler
{
public void characters (char[] chars, int start, int len)
{
author = author + new String(chars, start, len);
}
}
private class
TitleHandler extends ElementHandler
{
public void characters (char[] chars, int start, int len)
{
if
(!inVolume)
{
title = title + new
String(chars, start, len);
}
}
}
private class
PriceHandler extends ElementHandler
{
public void characters (char[] chars, int start, int len)
{
if
(!inVolume)
{
price =
price + new String(chars, start, len);
}
}
}
private class
VolumeHandler extends ElementHandler
{
public void startElement(String name, AttributeList atts)
{
inVolume = true;
}
public void endElement(String name)
{
inVolume = false;
}
}
The flag inVolume is used to track whether the current element is within a
containing <volume> element, in which case it is ignored. Once you've put all
this together (the full code can be found in the download for the book at
http://www.wrox.com) you can run this on a sample XML file with a command like
this:
>java DisplayBookList
file:///c:/data/books2.xml
The following output should then appear:
<html>
<head><title>Book
List</title></head>
<body><h1>A
List of Books</h1>
<table>
<tr><th>Author</th><th>Title</th><th>Price</th></tr>
<tr><td>Nigel Rees</td>
<td>Sayings of the
Century</td>
<td>8.95</td></tr>
<tr><td>Evelyn Waugh</td>
<td>Sword of
Honour</td>
<td>12.99</td></tr>
<tr><td>Herman Melville</td>
<td>Moby
Dick</td>
<td>8.99</td></tr>
<tr><td>J.
R. R. Tolkien</td>
<td>The Lord of
the Rings</td>
<td>22.99</td></tr>
</table></body></html>
You can elaborate on this design pattern as much as you
like. Possible enhancements include:
q
Providing element handlers with access to a stack
containing details of their context
q
Selecting element handlers based on conditions other
than just the element name
q
Using element handlers as part of a pipeline, by
allowing them to fire events into another DocumentHandler
The advantage of this design pattern is that it avoids a
great deal of if-then-else programming. It removes the need to change the
DocumentHandler to add conditional logic every time a new element type is
introduced. Instead all you need to do is to register another element
handler.