|
By now you are probably wondering exactly
how XSLT goes about processing an XML document in order to convert it into the
required output. There are usually two aspects to this process:
-
The first stage is a structural
transformation, in which the data is converted
from the structure of the incoming XML document to a structure that reflects
the desired output.
-
The second stage is formatting, in which the new structure is
output in the required format such as HTML or PDF.
The second stage covers the ground we
discussed in the previous section: the data structure that results from the
first stage can be output as HTML, a text file or as XML. HTML output allows
the information to be viewed directly in a browser by a human user or be input
into any modern word processor. Plain text output allows data to be formatted
in the way an existing application can accept, for example comma-separated
values or one of the many text-based data interchange formats that were developed
before XML arrived on the scene. Finally, XML output allows the data to be
supplied to one of the new breed of applications that accepts XML directly.
Typically this will use a different vocabulary of XML tags from the original
document: for example an XSLT transformation might take the monthly sales
figures as its XML input and produce a histogram as its XML output, using the
XML-based SVG standard for vector graphics. Or you could use an XSLT
transformation to generate VOXML output, for aural rendition of your data.
Let's now delve into the first stage,
transformation - the stage with which XSLT is primarily concerned and which makes
it possible to provide output in all of these formats. This stage might involve
selecting data, aggregating and grouping it, sorting it, or performing
arithmetic conversions such as changing centimeters to inches.
So how does this come about? Before the
advent of XSLT, you could only process incoming XML documents by writing a
custom application. The application wouldn't actually need to parse the raw
XML, but it would need to invoke an XML parser, via a defined Application
Programing Interface (API), to get information from the
document and do something with it. There are two principal APIs for achieving
this: the Simple API for XML (SAX) and the Document Object Model (DOM).
The SAX API is an event-based interface in which the parser notifies the application
of each piece of information in the document as it is read. If you use the DOM
API, then the parser interrogates the document and builds a tree-like object
structure in memory. You would then
write a custom application (in a procedural language such as C++, Visual Basic,
or Java, for example), which could interrogate this tree structure. It would do
so by defining a specific sequence of steps to be followed in order to produce the required output. Thus,
whatever parser you use, this process has the same principal drawback: every
time you want to handle a new kind of XML document, you have to write a new
custom program, describing a different sequence of steps, to process the XML.
Both the DOM and the SAX APIs
are fully described in the Wrox Press book Professional XML, ISBN
1-861003-11-0.
So how is using XSLT to perform transformations on XML
better than writing "custom applications"? Well, the design of XSLT
is based on a recognition that these programs are all very similar, and it
should therefore be possible to describe what they do using a high-level declarative language rather than writing each program from scratch in C++, Visual Basic,
or Java. The required transformation can be expressed as a set of rules. These
rules are based on defining what output should be generated when particular
patterns occur in the input. The language is declarative, in the sense that you
describe the transformation you require, rather than providing a sequence of
procedural instructions to achieve it. XSLT describes the required
transformation and then relies on the XSL processor to decide the most
efficient way to go about it.
XSLT still relies on a parser – be it a DOM parser or a
SAX-compliant one – to convert the XML document in to a "tree
structure". It is the structure of this tree representation of the
document that XSLT manipulates, not the document itself. If you are familiar
with the DOM, then you will be happy with the idea of treating every item in an
XML document (elements, attributes, processing instructions etc.) as a node.
With XSLT we have a high-level language that can navigate around a node tree,
select specific nodes and perform complex manipulations on these nodes.
The XSLT tree model is similar
in concept to the DOM but it is not the same. The full XSLT processing model is
discussed in Chapter 2.
The description of XSLT given thus far (a
declarative language that can navigate to and select specific data and then
manipulate that data) may strike you as being similar to that of the standard
database query language: SQL. Let's take a closer look at this comparison.
I like to think of an analogy with relational databases. In a relational
database, the data consists of a set of tables. By themselves, the tables are
not much use, the data might as well be stored in flat files in comma-separated
values format. The power of a relational database doesn't come from its data
structure; it comes from the language that processes the data, SQL. In the same
way, XML on its own just defines a data structure. It's a bit richer than the
tables of the relational model, but by itself it doesn't actually do anything
very useful. It's when we get a high-level language expressly designed to
manipulate the data structure that we start to find we've got something
interesting on our hands: and for XML data that language is XSLT.
Superficially, SQL and XSLT are very
different languages. But if you look below the surface, they actually have a
lot in common. For starters: in order to process specific data, be it in a
relational database or an XML document, the processing language must
incorporate a declarative query syntax for selecting the data that needs to be
processed. In SQL, that's the SELECT statement. In
XSLT, the equivalent is the XPath expression.
The XPath expression language forms an
essential part of XSLT, though it is actually defined in a separate W3C Recommendation
(http://www.w3.org/TR/xpath) because it can also be used
independently of XSLT (the relationship between XPath and XSLT is discussed
further on page 23).
The XPath query syntax is designed to
retrieve nodes from an XML document, based on a path through the XML document
or the context in which the node appears. It allows access to specific nodes,
while preserving the hierarchy and structure of the document. XSLT is then used
to manipulate the results of these "queries" (rearranging selected
nodes, constructing new nodes etc).
There are further similarities between XSLT and SQL:
- Both languages augment
the basic query facilities with useful
additions for performing basic arithmetic, string manipulation, and comparison
operations.
- Both languages supplement the declarative query syntax
with semi-procedural facilities for describing the sequence of processing to be
carried out, and they also provide hooks to escape into conventional
programming languages where the algorithms start to get too complex.
Both languages have an important
property called closure, which
means that the output has the same data structure as the input. For SQL this structure is
tables, for XSLT it is trees – the tree representation of XML documents. The
closure property is extremely valuable because it means operations performed
using the language can be combined end-to-end to define bigger more complex
operations: you just take the output of one operation and make it the input of
the next operation. In SQL you can do this by defining views or subqueries; in
XSLT you can do it by passing your data through a series of stylesheets.
In the real world, of course, XSLT and SQL
have to coexist. There are many possible relationships, but typically data will
be stored in relational databases and transmitted between systems in XML. The
two languages don't fit together as comfortably as one would
like, because the data models are so different. But XSLT transformations can
play an important role in bridging the divide. A number of database vendors are
working on products that integrate XML and SQL, though there are no standards
in this area as yet.
SQL Server 2000 will support
XPath queries on its data. Prior to the release of SQL Server 2000, Microsoft
has released the XML SQL Technology Preview, which allows access to data in a
SQL Server 6.5 or 7.0 databases in XML form.
The XML SQL Technology Preview
is available from http://msdn.microsoft.com/workshop/xml/articles/xmlsql/sqlxmlsetup.exe
Before we move on to look at a simple
working example of an XSLT transformation, we need to briefly discuss a few of
the XSLT processors that are available to effect these transformations.
The principle role of an XSLT processor is
to apply an XSLT stylesheet to
an XML source document and produce a result document. It is important to note
that each of these is an application of XML and so the underlying structure of
each is a tree. So, in fact, the XSLT processor handles three trees.
There are several XSLT processors to choose
from. Here I'll mention three: Saxon, xt, and Microsoft MSXML3. All of these can be downloaded free of charge (but do read the
licensing conditions).
|
TopXML EDITORS NOTE: You can get all of the updated
versions of these parsers in our Parsers Zone. Go here to get
them http://www.topxml.com/parsers
(it will open in a new window so you can keep on reading)
|
These three processors and
several others are described in Chapter 10.
Saxon is an open
source XSLT processor developed by the author of this book. It is a Java
application, and can be run directly from the command
prompt: no web server or browser is required. The Saxon program will transform
the XML document to, say, a HTML document, which can then be placed on a web
server. In this example, both the browser and web server only deal with the
transformed document.
If you are running Windows (95/98/NT/2000) the simplest way to use
it is to download Instant Saxon, which is packaged as a Windows executable. You
will need to have Java installed, but that will be there already if you have
any recent version of Internet Explorer. On non-Windows platforms you will need
to install the full Saxon product and follow the instructions that come with
it. You can download Instant Saxon for free from http://users.iclway.co.uk/mhkay/saxon/instant.html. Saxon will run with any XML parser that implements the SAX
interface (in its original Java form).
xt is another open source XSLT processor
developed by James Clark, the editor of the XSLT specification. Like Saxon,
this is a Java application that can be run from the command prompt; it too has
a simple packaged version for the Windows platform and a full version for other
environments. This time the download is from http://www.jclark.com/xml/xt.html. Like Saxon, xt can operate with any SAX-compliant parser.
Alternatively, you can run XSLT stylesheets
actually within Internet Explorer. You'll need to install Internet Explorer 5
and the latest version of the Microsoft MSXML processor, which you can find at http://www.microsoft.com/xml. The information here is correct for the 15 March 2000 technology
preview, referred to as MSXML3, but
Microsoft has promised a rapid sequence of new releases, so check the latest
position. MSXML3 comes with a new version of the MSXML parser.
Download and install both the SDK and the run-time package. Installing the SDK creates a
program called xmlinst.exe, typically in the windows\system directory. Run this program to establish MSXML3 as the default XML
processor to be used by Internet Explorer (if you don't do this, IE5 will try
to use the old 1998 MSXML processor, which implements an obsolete dialect of
XSL that is quite different from the language described in this book: see Chapter
10 for details). The big advantage of Microsoft's technology is that the XSLT
processing can take place on the browser.
I've avoided talking about specific
products in most of the book, because the information is likely to change quite
rapidly. It's best to get the latest status from the web. Some good places to
start are:
Now we're ready to take a look at an example of using XSLT to transform a very simple XML document.
Example: A "Hello, world!" XSLT
Stylesheet
Kernighan and Ritchie in their classic The
C Programming Language originated the idea of presenting a trivial
but complete program right at the beginning of the book, and ever since then
the "Hello world" program has been an honored tradition. Of course,
a complete description of how this example works is not possible until all
the concepts have been defined: so if you feel I'm not explaining it fully,
don't worry – the explanations will come later.
Input
What kind of transformation would we like to do? Let's try transforming the
following XML document:
<?xml version="1.0"
encoding="iso-8859-1"?>
<greeting>Hello,
world!</greeting>
A simple node-tree-representation of this
document would look as follows:

There is one root node per document. The
root node in the XSLT model performs the same function as the document node
in the DOM model. The XML declaration is not visible to the parser and,
therefore, is not included in the tree.
Output
Our required output is the following HTML, which will
simply change the browser title to "Today's Greeting" and display whatever greeting is in the source XML file:
<html>
<head>
<title>Today's greeting</title>
</head>
<body>
<p>Hello, world!</p>
</body>
</html>
XSLT StyleSheet
Without any more ado, here's the XSLT stylesheet to effect the transformation:
<?xml version="1.0"
encoding="iso-8859-1"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>Today's greeting</title>
</head>
<body>
<p><xsl:value-of
select="greeting"/></p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Running the Stylesheet
You can run this stylesheet using any of
the three processors described in the previous section.
|
TopXML EDITORS NOTE: You can get all of the updated
versions of these parsers in the TopXML/Parsers Zone. Go here to get
them http://www.TopXML/parsers
(it will open in a new window so you can keep on reading)
|
Saxon
With Saxon, the steps are:
- Download the processor
- Install the executable saxon.exe in a suitable directory, and make this the current directory
-
Using Notepad, type the two files above into hello.xml and hello.xsl respectively, within this directory
-
Bring up an MSDOS-style console window (Start |
Programs | MSDOS Prompt)
-
Type the following at the command prompt: saxon hello.xml
hello.xsl
-
Admire the HTML displayed on the standard output
If you want to view the output using your
browser, simply save the command line output as an HTML file, in the
following manner:
Saxon hello.xml hello.xsl > hello.html
xt
The procedure is very similar if you use
xt. This time the command to use the Windows executable is xt rather than saxon. It should give the same result.
MSXML3
Finally, you can run the stylesheet actually within Internet Explorer. You need
to modify the XML source file to include a reference to the stylesheet, so it
now reads:
<?xml version="1.0"
encoding="iso-8859-1"?>
<?xml-stylesheet
type="text/xsl" href="hello.xsl"?>
<greeting>Hello,
world!</greeting>
Now you should simply be able to
double-click on the hello.xml file, which will bring up IE5 and load hello.xml into the browser. IE5 reads the XML file, discovers what
stylesheet is needed, loads the stylesheet, executes it to perform the
transformation, and displays the resulting HTML. If you don't see the text
"Hello,
world!" on the screen, but just the XML
file, this is because you're using the original XSL interpreter that
Microsoft issued with IE5, not the MSXML3 version. If you see the stylesheet
displayed, this also indicates that you haven't completed the installation
process correctly: remember to run the xmlinst.exe program.
How it Works
If you've succeeded in running this
example, or even if you just want to get on with reading the book, you'll
want to know how it works. Let's dissect it:
<?xml version="1.0"
encoding="iso-8859-1"?>
This is just the standard XML heading. The interesting point is that an XSLT stylesheet is itself an XML
document. I'll have more to say about this later in the chapter. I've used iso-8859-1 character encoding (which is the official name for the character
set that Microsoft calls "ANSI") because in Western Europe and
North America it's the character set that most text editors support. If
you've got a text editor that supports UTF-8 or some other character encoding, feel free to use that instead.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
This is the standard XSLT heading. In XML
terms it's an element start tag, and it identifies the document as a
stylesheet. The xmlns:xsl attribute is an XML Namespace declaration, which indicates that
the prefix xsl is going to be used for elements defined in the W3C XSLT
specification: XSLT makes extensive use of XML namespaces, and all the
element names defined in the standard are prefixed with this namespace, to
avoid any clash with names used in your source document. The version
attribute indicates that the stylesheet is only using features from version
1.0 of the XSLT standard, which at the moment is the only version there is.
Let's move on:
<xsl:template match="/">
An <xsl:template> element defines a template rule to be triggered when a particular
part of the source document is being processed. The attribute match="/" indicates that this particular rule is triggered right at the
start of processing the source document. Here «/» is an XPath expression which identifies the root node of the document: an XML document has a hierarchic structure, and
in the same way as UNIX uses the special filename «/» to indicate the root of a hierarchic filestore, XPath uses «/» to represent the root of the XML content hierarchy. The DOM
model calls this the Document object, but in XPath it is called the root.
<html>
<head>
<title>Today's greeting</title>
</head>
<body>
<p><xsl:value-of select="greeting"/></p>
</body>
</html>
Once this rule is triggered, the body of
the template says what output to generate. Most of the template body here is
a sequence of HTML elements and text to be copied into the output file.
There's one exception: an <xsl:value-of> element, which we recognize as an XSL instruction because it uses
the namespace prefix xsl. This particular instruction copies the value of a node in the
source document to the output document. . The SELECT
attribute of the element specifies the node for which the value should be
evaluated. The XPath expression «greeting>> means: "find the set of all <greeting> elements that are children of the node that this template rule is
currently processing". In this case, this means the <greeting> element that's the outermost element of the source document. The <xsl:value-of> instruction then extracts the text node of this element, and
copies it to the output at the relevant place, in other words within the
generated <p> element.
All that remains is to finish what we
started:
</xsl:template>
</xsl:stylesheet>
In fact, for a simple stylesheet like the
one shown above, you can cut out some of the red tape. Since there is only one template rule, the <xsl:template> element can actually be omitted. The following is a complete, valid
stylesheet equivalent to the preceding one:
<html xsl:version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<head>
<title>Today's greeting</title>
</head>
<body>
<p><xsl:value-of select="greeting"/></p>
</body>
</html>
This simplified syntax is designed to make
XSLT look familiar to people who have learnt to use proprietary template
languages which allow you to write a skeleton HTML page with special tags (analogous to <xsl:value-of>) to insert variable data at the appropriate place. But as we'll see,
XSLT is much more powerful than that.
Why would you want to place today's
greeting in a separate XML file and display it using a stylesheet? One reason
is that you might want to show the greeting in different ways depending on the
context; for example, it might be shown differently on a different device. In
this case you could write a different stylesheet to transform the same source
document in a different way. This raises the question of how a stylesheet gets
selected at run-time. There is no single answer to this question. As we saw
above, Saxon and xt have interfaces that allow you to nominate both the
stylesheet and the source document to use. The same thing can also be achieved
with the Microsoft XSLT product, though it requires some scripting on the HTML
page: the <?xml-stylesheet?> processing instruction which I used in the example above only works
if you want to use the same stylesheet every time.
It's time now to take a closer look at the
relationship between XSLT and XPath and other XML-related technologies.
Wrox Press Limited,
US and UK.
|