SAX 2.0
SAX 1.0 has been very widely implemented and has been in
widespread use almost since the day the first draft appeared on 12 January 1998 a month earlier than the date of the final XML 1.0 recommendation. It has
met user needs well, in spite of a few criticisms, some of which are hinted at
in this chapter.
So
it is perhaps unsurprising that the development of a successor, SAX 2.0, has
been comparatively leisurely. Requirements were discussed on the XML-DEV mailing
list during the early months of 1999, and an alpha version of a revised spec was
published by David Megginson (though not widely advertised) on 1 June 1999.
There has been little adverse comment, and it seems likely that the final
specification of SAX 2.0 will be close to its current form, which can be found
on http://www.megginson.com/SAX/SAX2/.
Whether the specification will be widely implemented is
another matter. Time will tell.
The way in which the original SAX interface has been
extended is in itself quite interesting. A standard mechanism has been defined
to allow the application to ask the parser to support particular features or to
set particular properties; the parser in all cases has the option to refuse. The
set of features and properties that can be requested is itself entirely
open-ended. SAX2 defines a core set, but additional features and properties can
be invented by anyone at any time. To make this possible, the features and
properties are identified by a URI, in rather the same way as XML
namespaces.
The Configurable Interface
The key new interface in SAX2 is named Configurable. A SAX2 parser must implement the org.xml.sax.Configurableinterface as well as the org.xml.sax.Parser interface. The Configurableinterface contains four methods:
|
Method |
Description |
|
getFeature(featureName) |
Allows the application to ask the parser
whether or not it supports a particular feature.
|
|
setFeature(featureName,
boolean) |
Allows the application to request that the parser should turn a
particular feature on or off.
|
|
getProperty(featureName) |
Allows the application to request the current value of some
particular property.
|
|
setProperty(featureName,
object) |
Allows the application to set some particular property to the
supplied value. |
In
each case, if the parser does not recognize the feature or property name, it
must throw a SAXNotRecognizedException. This means in general that the application will not know whether
the parser supports the feature or not. If the parser recognizes the
name of the feature or property, but cannot set it to the requested value, it must throw a SAXNotSupportedException.
To
make this more concrete, consider one of the new core features, whose name is
http://xml.org/sax/features/validation. This feature is provided to fix the problem in
SAX 1.0 whereby an application has no way of discovering or controlling whether the parser is
a validating one. With SAX 2.0, if this feature is on, the parser must validate
the XML document; if it is off, it must not do so (in other words, the parse must succeed so long as the document is
well-formed).
An
application that explicitly requires a validating parser may call:
parser.setFeature("http://xml.org/sax/features/validation",
true);
This is a core feature, so every SAX2 parser should
recognize its name. A parser that can perform validation will return normally,
while a parser that cannot perform validation will throw a SAXNotSupportedException.
Equally, an application that explicitly requires the parser
not to do
validation may call:
parser.setFeature("http://xml.org/sax/features/validation",
false);
This time, a parser that insists on doing validation must
respond to this request with a SAXNotSupportedException.
On
the other hand, an application that simply wants to know whether the parser is
performing validation or not may call:
if
(parser.getFeature("http://xml.org/sax/features/validation")) ...
Core Features and
Properties
The following core features and properties are defined in SAX2. A feature is simply shorthand
for a property whose value is a Boolean:
|
Name (prefixed http://xml.org/sax) |
Value |
Meaning |
|
/features/validation |
boolean |
Perform validation. |
|
/features/external-general-entities |
boolean
|
Expand general (parsed) external entities. |
|
/features/external-parameter-entities |
boolean
|
Expand the external DTD subset and external parameter
entities. |
|
/features/namespaces |
boolean
|
Process namespace declarations. Element and attribute
names with a prefix will have the prefix replaced by the URI of the
namespace |
|
/features/normalize-text |
boolean
|
Normalize character data, by ensuring that all
consecutive pieces of character data are passed in a single call of the
characters() method. |
|
/features/use-locator |
boolean
|
Supply the application with a Locator
object by calling the setDocumentLocator() method. |
|
/properties/namespace-sep |
String |
Separator to be used between the URI and the local
part of a name when the namespaces feature is enabled. |
|
/properties/dom-node |
org.w3c.dom.Node |
Read-only property: if the DOM for the source
document exists in memory, this property identifies the DOM node relating
to the current event. |
|
/properties/xml-string |
String |
Read-only property: a character string giving the XML
representation of the current event. |
|
/handlers/DeclHandler |
org.xml.sax.misc. DeclHandler |
Set a handler to process element and attribute
declarations encountered in the DTD. |
|
/handlers/LexicalHandler |
org.xml.sax.misc. LexicalHandler |
Set a handler to process lexical events. These
include CDATA sections, entities, and comments. |
|
/handlers/NamespaceHandler |
org.xml.sax.misc. NamespaceHandler |
Set a handler to process namespace
declarations. |
The core properties in SAX2 thus include three new event-handling interfaces:
features, properties, and handlers. (Remember, however, that "core" simply means every parser
must recognize a request for these features; it still has the right to refuse
the request.)
The declaration handler, DeclHandler, meets the requirement for access to the structural
definitions in the DTD. It provides access to element declarations in the simplest possible way, as a string that the
application must parse.
The lexical handler, LexicalHandler, meets the requirement for access to information that
was suppressed in SAX 1.0 because it was considered to be of no interest to
applications. This includes the boundaries of internal entities, the boundaries of CDATA sections, and the existence
of comments. Many application writers asked for these features because they
enable the application to minimize the changes made to a document as it is being
copied. Comments are needed for other reasons as well: for example, the XSLT
recommendation allows a style sheet to say what should happen to comments in the
source document, so an XSLT interpreter written using the SAX interface needs
access to this information.
The namespace handler, NamespaceHandler, meets more advanced namespace handling requirements
than the namespaces feature. Whereas the namespaces feature simply expands element and
attribute prefixes using the namespace definitions currently in force, a
namespace handler allows the namespace definitions themselves to
be processed as events in their own right. This is useful in several
circumstances:
q
Where the application uses prefixes in contexts other
than element and attribute names (for example, it might use them in attribute
values)
q
Where the application needs to know the prefix that was
used (for example, for use in error messages, or in attempting to copy parts of
the original document)
As
remarked earlier, the SAX 2.0 specification cannot yet be regarded as stable, so even if you
find a parser that supports it, use it with care.
Summary
We've presented some information about the origins of the
SAX interface, which is implemented by a wide variety of parsers.
The thing that characterizes SAX, and that distinguishes it
from the DOM interface, is that it is event-based. We discussed some of the
factors that might cause you to use an event-based interface in preference to
the DOM.
We
discussed the structure of a simple SAX application, and the relationship of the
three main classes, the application, the parser, and the document handler. We
showed several examples of how to write SAX applications using these
classes.
We
presented some of the important design patterns for SAX applications, in
particular, the filter or pipeline pattern, and the rule-based
pattern.
Finally, we gave a preview of the features that are
expected to appear in SAX 2.0 when it stabilizes.
We
should end with a word of caution. All the examples shown in this chapter could
be coded much more easily in XSLT, which we will discuss in Chapter 9. Of course
that doesn't mean there is no need for SAX: Java applications can do many things
that XSL style sheets can't for example, loading data into a relational
database; and they will usually be much faster. But it's worth thinking twice
about your problem before you rush into assuming that SAX is the answer, because
in many cases an XSL approach, or a hybrid approach using XSL for preprocessing,
may be preferable.