Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :
03/24/2008
Times viewed :
561
Xalan: Transformation – Performance Enhancement
The major
problem I face in my programming is the time taken to transform a source xml
to result (html, pdf etc.) using stylesheet. Sometimes we use transformer
provided without looking into details of transformer’s features and properties.
Practically, we used to to find a short cut method to get transform done,
unless and otherwise there was a specific need for performance in our project
plan. We generally try to write a stylesheet. Use sun provided architecture classes
for transformation then convert our XML document into html documents. This is
not only me, I can say almost all programmers follow. We never think about
features and properties provided by a particular API.
Here I am
covering some performance enhancement tips to be followed while doing
transformation.
XALAN – JAVA DTM
Introduction:
The Document
Table Model (DTM) is an interface to a Document Model designed specifically for
the needs of our XPath and XSLT implementations. The motivation behind this
model is to optimize performance and minimize storage. In Document object
model, it stores everything in memory as a String. Also it instantiates all dependent
objects for a DOM object. String is an immutable object so there is a
performance concern when the xml document is large in size. So to avoid this
type of tree structure in memory, DTM avoids the overhead of instantiating the
objects the standard DOM requires to represent a tree of nodes. DTM uses unique
integer "handles" to identify nodes, integer ID values to represent
URLs, local names, and expanded names, and integer index and length references
to a string buffer to represent the text value of each node.
In general, the
"read" APIs to DTM resemble those of the W3C Document Object Model (DOM)
interface. However, in place of the DOM object tree of nodes, DTM uses integer
arrays and string pools to represent the structure and content of the XML
document to be transformed. DTM also structures the document's contents
slightly differently, to better match the XPath data model; some details and
constraints present in a standard DOM are suppressed, and a few XPath-specific
features are added.
DTM is a
read-only model, and so it does not have capability to create/modify nodes as the
DOM's write or create-node operations.
The details of
constructing a DTM vary depending on which implementation of this API you are
using. Two reference implementations are currently available:
SAX2DTM (built
via a SAX stream)
DOM2DTM (which
provides DTM access to an existing DOM)
Both DTMs can be
built incrementally (see incremental transforms). When operating incrementally, the DTM
allows the Xalan-Java processor to begin reading the DTM and performing the
transformation while the DTM is still being assembled (for example, while the
parser is still parsing the XML source), and attempts to do only as much work
as is needed to support the read requests actually made by the XPath or XSLT
processor.
For the
convenience of user-written extensions, a proxy mechanism presents the contents
of the DTM as a read-only subset of the DOM.
DTM Performance Setting
Xalan-Java implements
two DTM performance features that you can control with the TransformerFactory setAttribute(String
name, Object value) method.
Set this feature to true to enable incremental transformations. If
set to false (the default), the transform and the parse are performed on the
same thread.
Note: When this feature is set to true: If the parser is Xerces,
we perform an incremental transform on a single thread using the Xerces
"parse on demand" feature. If the parser is not Xerces, we run the
transform in one thread and the parse in another. Exception: if the parser is
not Xerces and the XML source is a DOMSource, setting this feature to true has
no effect.
Note: The incremental
feature is not currently supported
by the XSLT Compiling processor, XSLTC. I will be updating this section when
this feature is supported by the XSLTC.
Example of setting incremental transforms to true (for the
XSLT Interpretive processor):
Set this feature to true(the default), this feature enables
optimizations that may involve structural rewrites of the stylesheet. Any tool
that requires direct access to the stylesheet
structure should set this feature to false.