Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
MinML, presumably standing for minimal XML, can be downloaded
from http://www.wilson.co.uk/xml/minml.htm. It is the smallest
parser reviewed in this book, and the fastest one mentioned in this
chapter. It is also SAX 1.0 compliant, and consumes less memory
than NanoXML.
However, it does not offer a pull parsing mechanism. Parsing is
only available through the SAX 1.0 interface, which "pushes" events
into your application code (see Push, Pull, and Object Model
Parsing, page 573).
After telling the parser to begin, the MinML calls back (or
pushes) into your application code to notify you of parse events.
This model forces your code to maintain state information within
the callback class(es), and to evaluate that state at each event.
This is less programmer-friendly than pull parsers like kXML and
XPP, but it's hard to argue with MinML's raw speed.
For benchmarks see http://www.extreme.indiana.edu/~aslom/exxp/.
Perhaps MinML wouldn't be as fast if we account for the state
information which application code must maintain that is maintained
for us automatically by pull parsers.
What's Supported, What's Not Supported
Feature
Supported
Notes
Document validation
No
DTDs are read but ignored
Well-formed XML only
Yes
Throws org.xml.sax.SAXException if not well-formed
Mixed content
No
Throws org.xml.sax.SAXException
Entity expansion
No
Throws org.xml.sax.SAXException if predefined or general
entities are used in an XML document. Parameter entities in the DTD
are OK.
SAX
Yes, SAX 1.0
DOM
No
Comments
Ignored
Processing Instructions
Ignored
Namespaces
Indirectly
Prefixes aren't distinguished from local parts -
<prefix:name> becomes an atomic element or attribute
Document Locator
No
Provides document line and column information
JAR size
15.3KB
As stated above, SAX 1.0 is implemented in its bare-bones state.
Locales aren't supported. Warnings and errors aren't supported; all
errors and warnings are reported as fatalerrors. Public and system
identifiers aren't supported.
Document locators, however, are supported. Document locators
allow your application to locate the line and column number that
triggered the SAX callback.
There is no support for entities, but parameter entities in a
DTD are okay. Although this is a non-validating parser, DTDs are
allowed (they are simply ignored). Processing instructions,
although also ignored, don't throw exceptions like they do in
NanoXML. Finally, ignorable whitespace is not reported to the
application.
There is no pull-parsing mechanism, even as an add-on. This is a
strict no-nonsense SAX 1.0 push parser, which will require you to
track all state while documents are parsed.
Finally, and perhaps most significantly, there is no way to
build a document. There is no interface that can build and
output a document tree. If you require more than just parsing in
your application, MinML won't be enough for you.
MinML provides no way to natively build and output documents:
there is no object model, or element/node class from which to build
documents
In true minimalist fashion, this package contains only one class
- MinML. However, let's briefly
look at it.
Class MinML
uk.co.wilson.xml
public class MinML
extends
java.lang.Object
implements
uk.org.xml.sax.Parser,
uk.org.xml.sax.DocumentHandler,
org.xml.sax.Locator,
org.xml.sax.ErrorHandler
Although this is the only class in its package, it implements
and uses many of the SAX 1.0 interfaces and classes. Those
interfaces and classes must be distributed with MinML and should be
in the CLASSPATH variable.
This class can be used in one of two ways:
Extending it with your own class and overriding the SAX methods
in which you are interested
Creating an instance of the class, calling setDocumentHandler()
on the instance, and calling its parse() method with an
org.xml.sax.InputSource object or a java.io.Reader object
You will notice that class MinML implements
uk.org.xml.sax.Parser and uk.org.xml.sax.DocumentHandler instead of
the org.xml.sax.Parser and org.xml.sax.DocumentHandler. These two
interfaces actually just extend their SAX counterparts and override
only three methods. It is by overriding these methods that MinML
implements one of its unique features: sending output to a
java.io.Writer object.
SAX's DocumentHandler interface has two methods, startElement()
and startDocument(), both of which return void. The versions in
uk.org.xml.sax.DocumentHandler, however, return a java.io.Writer.
By overriding these methods in your application and returning a
Writer, MinML will write character data to the Writer object
instead of calling back the application's characters() method.
The XSLT Compiler, originally produced by Sun Microsystems, but
now donated to the Apache XML Project, is a tool for compiling
extensible stylesheets into lightweight Java code. The compiled XSL
sheets consist of standard Java bytecode and are called
translets. During runtime, whenever your code wants to
transform some XML, only three steps need be taken:
Ask the XSLTC runtime to
parse an XML document
Pass the object
representing the parsed XML to the translet, along with an object
implementing interface org.xml.sax.DocumentHandler or
org.apache.xalan.xsltc.DOM
Tell the translet to
transform() the parsed XML
The translet calls back into the object implementing the
org.xml.sax.DocumentHandler interface using the standard SAX 1.0
callback mechanism, or builds a DOM-style tree with the object
implementing org.apache.xalan.xsltc.DOM.
In this way, the XSLTC runtime and translets can be used to
repeatedly generate any type of output based upon original
stylesheets and any input XML.
Compiled stylesheets are called translets.
The XSLT Compiler is itself written in Java, so translet
creation can be done on any operating system with a Java 2 VM.
Translets are merely Java classes, but are designed to run on any
Java VM, not just Java 2 VMs. Since they are class files, their
compilation is usually done with build scripts along with the rest
of your project code.
The XSLT Compiler can be downloaded from
http://xml.apache.org/xalan-j
It has two important dependencies:
The Constructor of Useful Parsers (CUP) Parser Generator
The Byte Code Engineering Library (BCEL)
The CUP Parser Generator is used to generate a Java parser and
scanner from a stylesheet (the grammar). BCEL is used by XSLTC to
convert the parser and scanner from Java source code to bytecode.
Both are included as JARs in the Xalan distribution.
In this section, we will:
Examine what is supported and not supported by XSLTC
Discuss the benefits of translets over traditional
transformation engines, specifically in regard to lightweight
clients
Make a translet using a real example
Review the major classes and steps involved in using the
XSLTC
Sun and Apache have done an excellent job covering the XSLT 1.0
recommendation. It will be easier to list the features of XSLT that
aren't supported by XSLTC, rather than the ones that
are.
Feature
Supported
Notes
SAX 2.0 callbacks (org.xml.sax.ContentHandler) for transformed
documents
No
SAX 1.0's org.xml.sax.DocumentHandler is supported
Simplified stylesheets
No
The simplified syntax for stylesheets that consist of only a
single template for the root node isn't permitted. The syntax is a
literal result element that can represent the whole document. For
example:
<total xsl:version="1.0" xmlns:xsl=
http://www.w3.org/1999/XSL/Transform>
<xsl:value-of select="cart/total"/>
</total>
Notice the <xsl:stylesheet> prolog is missing, along with
some other things.
The ability to match elements by their unique id attribute is
unsupported
Match patterns using <xsl:key>
No
The ability to match elements using implicit cross-referencing
(keys) is unsupported. For more information, see
http://www.w3.org/TR/xslt.html#key
Namespace axis
No
The namespace axis isn't supported. It is defined this way: if
the starting node in the axis is an element, the axis selects all
the namespace nodes that are in scope for that element; otherwise,
the axis selects nothing. For instance:
<<namespace::*>>
Document validatation
Yes
DTD validation of XML source
DOM parser included
Yes
If you're using XSLTC on a lightweight client, you can make use
of the DOM Level 1 parser independent of XSLTC that comes with
xml.jar. No need to include the JAR files for NanoXML or other
parsers.
SAX parser included
Yes
A SAX 1.0 parser comes with the xml.jar library. The parser can
actually be removed and replaced with another SAX library, but the
current documentation on how to do this is non-existent.
There are at least four reasons why information appliances
needing to do XSL transformations should consider translets over
traditional transformation engines:
Smaller memory footprint -
translet and runtime classes are minimized by including only those
XSLTC features required for that particular transformation
Performance - Sun claims
performance gains between 30-270% over Saxon, Xalan, and XT
processors, depending upon stylesheet and XML input sizes -
performance is even more of a key issue on limited devices
Freedom - runtime and
translet classes are Java 1.1 bytecode
Reduced network traffic -
XSL stylesheets need not be downloaded as they are already
distributed with an application
Let's discuss each of these in a little more detail.
Smaller Memory Footprint
Traditional transformation engines have large memory footprints.
For example, Apache's Xalan-Java 2.0.1 requires xalan.jar and
xerces.jar (the Apache Xerces XML parser). This represents over 2.2
MB of bytecode! Not many would argue that this is an unreasonable
demand for most of today's lightweight clients.
Translets, on the other hand, require three things:
The XSLTC runtime JAR
xsltcrt.jar (119 KB, 117 KB reduced)
The xml.jar file (126
KB)
Any translet class files
your application needs to use (typically 2 -10KB each, depending
upon the size and complexity of the original XSL stylesheet)
XSL functionality used in most stylesheets is included in
xsltcrt.jar, while functionality specific to certain stylesheets
(and not already included in xsltcrt.jar) is compiled into those
stylesheets' translets. This approach minimizes the size of the
runtime library without sacrificing XSLT compatibility.
xml.jar was Sun Project X, the precursor to Apache Crimson. It
contains the SAX 1.0 and DOM Level 1 interfaces, as well as SAX and
DOM implementations. It's important to note that since this JAR
file is required at runtime, you are given SAX and DOM parsers for
free; there's no need to include another parser such as
NanoXML.
Since translets require Sun Project X (xml.jar), which contains
both SAX 1.0 and DOM Level 1 parsers, there's no need to include
another parser in your lightweight client. The parsers in xml.jar
can be used independently and separately from translets, if
needed.
Finally, the compiled XSL stylesheets themselves must be
available on the target lightweight platform. These vary in size
depending upon the original XSL stylesheet, but are usually 2-10KB.
Of course, multiple translets can be deployed for a given
application, allowing the application to transform different
document classes in a variety of different ways.
Kilobytes per second are the total number of bytes in the input
and output XML documents divided by twice the elapsed time. See http://www.xml.com/pub/a/2001/03/28/xsltmark/results.html
for more
information. Although we have compared two different sets of data
in the same graph (something the XSLTMark authors warn against), we
have still used the graph for general comparisons. With an older
version of XSLTC, we see it is second only to XT. Newer versions of
XSLTC claim even better performance gains. Since the release notes
for release Alpha 5 specifically state, "performance has been
greatly improved" over Alpha 4, we'll have to perform more
benchmarks to get the clearest performance picture.
Michael Kay, the creator of Saxon, stated in XSLT
Programmer's Reference, 2nd Edition ISBN 1861005067,
that he has found translet performance to be roughly comparable to
Saxon and XT.
Clearly, XSLTC is a contender in regards to performance with the
large, conventional XSLT processors. However, this only serves to
draw attention to XSLTC's supreme advantage over these processors -
the small size of the generated stylesheet bytecode compared to
running a transformation in a large, powerful interpreter.
This is one factor which clearly points towards a big future for
translets within information appliances.
Freedom
Although the XSLT compiler requires Java 2, the class files it
generates can be used with any Java VM. This is an important point
because even though we may not be able to compile XSL stylesheets
on lightweight clients, we should always be able to use translets
on lightweight clients with J2ME.
Reduced Network Traffic
Since translets can be distributed with an application, XSL
stylesheets no longer need to be downloaded from a server. This is
good news for devices using constrained wireless networks, such as
the 9.6 kpbs Cellular Digital Packet Data (CDPD) connection. The
flexibility of downloading new stylesheets as needed is not always
lost, however. Translets, if used in applets, can be downloaded
from the server on which the applet was downloaded just like any
other Java class.
There are no Javadocs for this package, but let's hope Apache
changes this soon. The package contains the interfaces and classes
used and implemented by translets. Remember that translets are
compiled XSL stylesheets. The primary interfaces and classes in
this package are:
Translet
TransletOutputHandler
TransletException
We will discuss each of these in this section.
Interface Translet
org.apache.xalan.xsltc
public interface Translet
A class that implements interface Transletmust be able to
transform XML input into the output specified by the mapping in the
original XSL stylesheet. The XSLTC library creates classes that
implement this interface; you shouldn't ever need to write code
that implements interface Translet. You will, however, need to call
the transform() method to tell the implementing class when to begin
the transformation process.
The transform() Method
public void transform(DOM Document,
TransletOutputHandler handler)
throws
TransletException
A transformation requires two items: a
org.apache.xalan.xsltc.dom.DOMImpl object (which unfortunately
carries no documentation!) and an object implementing the
org.apache.xalan.xsltc.TransletOutputHandler interface. DOMImpl
implements interface org.apache.xalan.xsltc.DOM. These two items
are created in your application code and given to the translet.
Arguments
Arguments
Type
Effect
Document
org.apache.
xalan.xsltc.DOM
The parsed XML input document to be transformed. It is a DOM
tree implementing the DOM interface, and so is usually an instance
of the DOM implementation class
org.apache.xalan.xsltc.dom.DOMImpl
handler
org.apache.
xalan.xsltc.
Translet
OutputHandler
The callback handler, which the translet uses to notify your
application of transformed elements, attributes, and data.
Conceptually very similar to SAX's org.xml.sax.DocumentHandler.
Usage and Examples
To create an instance of a class which implements
org.apache.xalan.xsltc.Translet, we use the Java reflection
API:
Class cls = Class.forName("MyClass");
Translet xlet = (Translet)cls.newInstance();
xlet.tranform(dom, handler);
"MyClass" is the name of the class generated by XSLTC during
compilation (we go over how to compile an XSL stylesheet in the
section Example: Compiling and Using a Translet, page 613).
xlet.transform() method can now be called to perform the
transformation.
Interface TransletOutputHandler
org.apache.xalan.xsltc
publicinterface TransletOutputHandler
This interface contains the callback methods which a translet
calls as it transforms XML input to some output. Conceptually, a
translet behaves just like a SAX parsing engine, calling back into
interface org.apache.xalan.xsltc.TransletOutputHandler instead of
interface org.xml.sax.DocumentHandler or
org.xml.sax.ContentHandler. However, the designers of XSLTC have
chosen natively to support TransletOutputHandler rather than SAX
1.0's DocumentHandler and SAX 2.0's ContentHandler.
Conceptually, a translet behaves just like a SAX parsing engine,
calling back into
interfaceorg.apache.xalan.xsltc.TransletOutputHandler instead of
interface org.xml.sax.DocumentHandler or
org.xml.sax.ContentHandler.
SAX 1.0 is supported by wrapping a
org.apache.xalan.xsltc.runtime.TextOutput object around an object
implementing TransletOutputHandler. Since SAX is the de facto push
parser standard, we'll focus on how to use it with translets rather
than the proprietary TransletOutputHandler. However, let's briefly
examine some of TransletOutputHandler to further understand how
translets work.
Callback Methods
public void startDocument()
throws
TransletException
public void endDocument()
throws
TransletException
public void characters(char[] characters, int offset, int
length)
throws
TransletException
public void startElement(String elementName)
throws
TransletException
public void endElement(String elementName)
throws
TransletException
public void attribute(String attributeName, String
attributeValue)
throws
TransletException
public void comment(String comment)
throws
TransletException
public void processingInstruction(String target, String
data)
throws
TransletException
Although this isn't complete, you should immediately see the
similarities between this interface and SAX 1.0's
org.xml.sax.DocumentHandler and SAX 2.0's
org.xml.sax.ContentHandler.
A helper class is given to us to enable SAX 1.0 support.
org.apache.xalan.xsltc.runtime
TextOutput not only implements interface TransletOutputHandler, but
it also maps TransletOutputHandler methods to corresponding
org.xml.sax.DocumentHandler methods.
Usage and Examples
So now let's look at some code, which transforms XML and
notifies us of the new (transformed) document via SAX.
First, recall the signature of Translet.transform():
public void transform(DOM Document,
transletOutputHandler handler)
throws
TransletException
And here's our code:
//load and create the translet
Class cls = Class.forName("MyClass");
Translet xlet = (Translet)cls.newInstance();
DOMImpl dom = new DOMImpl(); //will contain the parsed
//source XML
//build DOM tree from source XML into the dom object (not
Creates a DOM tree from source XML (this part has been removed,
but we will demonstrate how to do this in the section Example:
Compiling and Using A Translet, page 613)
Creates an object which implements interface
org.xml.sax.DocumentHandler (saxHandler)
Creates a TextOutput object and passes it the saxHandler
Starts the transformation by calling transform()
We can clearly see in the bolded line how TextOutput maps its
implementation of TransletOutputHandler to DocumentHandler.
Class TransletException
org.apache.xalan.xsltc
public class TransletException
extends
java.lang.Exception
This is the exception class thrown by Translet.transform() and
all of the methods in interfaceTransletOutputHandler. Since you
probably will use the SAX interface via the TextOutput wrapper
(never directly implementing TransletOutputHandler), you won't need
to catch TransletExceptions except when calling
Translet.transform().
There aren't any special methods in this class. You should
handle TransletException objects in the same manner that you treat
other Throwable classes extending java.lang.Exception.
Now let's take a look at a creating a translet and an
application which uses it.
In this example, we have a trouble ticket system to which our
client connects. The client can add, update, and view trouble
tickets. For this example, however, we'll concern ourselves only
with viewing trouble tickets already in the system. Here are the
steps our client application will take:
Read an XML document,
representing a single existing trouble ticket. To simplify matters,
we'll read the document from persistent storage instead of from a
network
Invoke a translet to
convert the TroubleTicket document into Wireless Markup Language
(WML). You don't need to know WML to understand this example, but
if you do, we'll translate the single trouble ticket into a single
card in one WML deck. A more advanced system might be able to query
and collate multiple trouble tickets into multiple cards within the
same deck to save network trips
If we passed the WML to a
browser at this point, or wrote our own browser within the
application, we could view the WML. However, for simplicity, we'll
just write the WML to stdout
Before we write the application, we will need to compile a
translet from a "TroubleTicket to WML" XSL stylesheet. So, here's
how we'll present this example:
Examine a document
instance of a trouble ticket document class
Present an XSL stylesheet
that transforms <TroubleTicket/> documents into WML
Compile a translet from
the XSL stylesheet
Write the client
application that uses the translet and a trouble ticket document
instance to produce an instance of a WML document
So let's begin by taking a look at TroubleTicket.xml, a
TroubleTicket document instance:
<?xml version="1.0" encoding="UTF-8"?>
<TroubleTicket ID="T746284" Importance="High"
Status="Open" PrimaryHelpAgent="Melissa">
<Description>Installation
failed</Description>
<Customer Name="Int'l Steel" ID="1573">
<Contact Status="Primary" Name="Ann
McKinsey"
Phone="303-781-7777"
Fax="303-781-7778"
Email="AMcKinsey@IntSteel.com"/>
</Customer>
<Product Name="SteelPlant2001" Rev="2.0"
Code="537010502"/>
<Incident>
<Call Type="Inbound" StartTime="02/17/2001
10:35"
Duration="17"
HelpAgent="Johnson">Customer received
network errors during
installation. Disconnection
from network caused reboot msg.
He will reboot and
call back.
</Call>
</Incident>
</TroubleTicket>
This is pretty straightforward so we won't go into it much.
This, and documents of this class, will be the source XML to our
translet. For brevity's sake, the DTD for this document class has
been omitted.
Sample TroubleTicket.xsl
Now let's take a look at the guts of the application: the XSL
stylesheet that converts TroubleTicket document instances into WML.
We'll call this TroubleTicket.xsl:
An XSLT processor, using the stylesheet above, will produce the
following WML from the TroubleTicket document instance in The
TroubleTicket Document (page 614):
<wml>
<card id="T746284" title="Ticket:T746284">
<p>
<b>Installation
failed</b>
<br/>
<br/>
Int'l Steel<br/>
Ann McKinsey<br/>
303-781-7777<br/>
SteelPlant2001
</p>
</card>
</wml>
Compiling a Translet
Now let's compile the XSL stylesheet from the previous section
into a translet (Java class file). The compiler is class
org.apache.xalan.xsltc.compiler.XSLTC, and you will need to set
your classpath to include the following JAR files:
Make sure that TroubleTicket.xsl is in the current directory, or
provide its full path on the command-line.
You should now have a class file called TroubleTicket.class. It
resides in the directory from which you ran XSLTC, unless the -d
<directory> argument is used. Note that a build script to
build this with Apache's ant tool is available from http://www.wrox.com/. See Appendix A.
WML TroubleTicketViewer Application
Our last step is to build an application that uses the translet
and source XML to generate WML.
You should be able to compile this code with the ant build.xml
file available at the Wrox Press web site along with all source
code (see Appendix A). Some of this code we've already seen.
import org.apache.xalan.xsltc.*;
import org.apache.xalan.xsltc.dom.DOMImpl;
import org.apache.xalan.xsltc.runtime.TextOutput;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.XMLReader;
import org.xml.sax.InputSource;
import java.io.FileReader;
public class TroubleTicketViewer {
public TroubleTicketViewer(String inputfile) throws
Exception {
//load and create the translet
Class cls =
Class.forName("TroubleTicket");
Translet xlet =
(Translet)cls.newInstance();
DOMImpl dom = new DOMImpl(); //will contain
the
parsed source XML
//create SAX 2.0 parser & get the
XMLReader object it
uses
SAXParserFactory factory =
SAXParserFactory.newInstance();
SAXParser parser =
factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
//Set the DOM's builder as the XMLReader's
SAX 2.0 content handler
reader.setContentHandler(dom.getBuilder());
//parse
reader.parse(new InputSource(new
FileReader(inputfile)));
Now that we've parsed the source XML and have it in DOM, let's
tell the translet to do the translation. We'll have the translet
put the translated document into another DOMImpl object, wmlDOM,
although we could have passed it a SAX 1.0 handler to receive
callbacks instead.
DOMImpl wmlDOM = new DOMImpl();
//implements
sax.DocumentHandler
TextOutput txtOutput; //implements
TransletOutputHandler
txtOutput = new
TextOutput(wmlDOM.getBuilder());
//pass the translet the source XML and a
handler
xlet.transform(dom, txtOutput);
Finally, let's output the WML to stdout:
wmlDOM.print(1, 1); //print the root and its
children
}
public static void main(String[] args) throws
Exception
{
TroubleTicketViewer ttv = new
TroubleTicketViewer(args[0]);
}
}
Run the application:
> java TroubleTicketViewer TroubleTicket.xml
You should get this WML output. It's not pretty, but it's what
we expect:
<wml><card
title="Ticket:T746284"
id="T746284"><p><b>Installation
failed</b>
<br></br>
<br></br>
Int'l Steel<br></br>
Ann McKinsey<br></br>
303-781-7777<br></br>
SteelPlant2001</p>
</card>
</wml>
The Future
As XSLTC begins to benefit from the open source development
cycle at Apache, expect new initiatives for transformation in the
information appliance arena. Translets are ideally placed to foster
a viable peer-to-peer environment on small devices, bringing with
them the power of universal transformation to the growing support
for XML on these appliances. Keep an eye on the
xalan-dev@xml.apache.org mailing list!