|
What are the most significant characteristics of XSLT as a
language, which distinguish it from other languages? In this section
I shall pick three of the most striking features: the fact that it is
written in XML syntax, the fact that it is a language free of
side-effects, and the fact that processing is described as a set of
independent pattern-matching rules.
Use of XML Syntax
As we've seen, the use of SGML syntax for stylesheets was proposed
as long ago as 1994, and it seems that this idea gradually became the
accepted wisdom. It's difficult to trace exactly what the overriding
arguments were, and when you find yourself writing something
like:
<xsl:variable name="y">
<xsl:call-template name="f">
<xsl:with-param
name="x"/>
</xsl:call-template>
</xsl:variable>
to express what in other languages would be written as « y =
f(x); », then you may find yourself wondering how such a
decision came to be made.
In fact, it could have been worse: in the very early drafts, the
syntax for writing what are now XPath expressions was also expressed
in XML, so instead of writing select="book/author/first-name" you had
to write something along the lines of:
<select>
<path>
<element type="book">
<element type="author">
<element
type="first-name">
</path>
</select>
The most obvious arguments for expressing XSLT stylesheets in XML
are perhaps:
- There is already an XML parser in the browser, so it keeps
the footprint small if this can be re-used.
- Everyone had got fed up with the syntactic inconsistencies between
HTML/XML and CSS, and didn't want the same thing to happen again.
- The syntax of DSSSL was widely seen as a barrier to its adoption;
better to have a syntax that was already familiar in the target
community.
- Many existing popular templating languages are expressed as an
outline of the output document with embedded instructions, so this is
a familiar concept.
- All the lexical apparatus is reusable, for example Unicode
support, character and entity references, whitespace handling,
namespaces.
- It's occasionally useful to have a stylesheet as the input or
output of a transformation (witness the Microsoft XSL converter as an
example), so it's a benefit if a stylesheet can read and write other
stylesheets.
- Providing visual development tools easily solves the inconvenience
of having to type lots of angle brackets.
Like it or not, the XML-based syntax is now an intrinsic feature
of the language that has both benefits and drawbacks. It does require
a lot of typing: but in the end, the number of keystrokes has very
little bearing on the ease or difficulty of solving particular
transformation problems.
No Side-effects
The idea that XSL should be a declarative language free of
side-effects appears repeatedly in the early statements about the
goals and design principles of the language, but no-one ever seems to
explain why: what would be the user benefit?
A function or procedure in a programming language is said to have
side-effects if it makes changes to its environment, for example if
it can update a global variable that another function or procedure
can read, it can write messages to a log file, or prompt the user. If
functions have side-effects, it becomes important to call them the
right number of times and in the correct order. Functions that have
no side-effects (sometimes called pure functions) can be called any
number of times and in any order. It doesn't matter how many times
you evaluate the area of a triangle, you will always get the same
answer; but if the function to calculate the area has a side-effect
such as changing the size of the triangle, or if you don't know
whether it has side-effects or not, then it becomes important to call
it once only.
I expand on this concept in the section on Computational
Stylesheets in Chapter 8, page 545.
It is possible to find hints at the reason why this was considered
desirable in the statements that the language should be equally
suitable for batch or interactive use, and that it should be capable
of progressive rendering. There is a concern that when you download a
large XML document, you won't be able to see anything on your screen
until the last byte has been received from the server. Equally, if a
small change were made to the XML document, it would be nice to be
able to determine the change needed to the screen display, without
recalculating the whole thing from scratch. If a language has side
effects then the order of execution of the statements in the language
has to be defined, or the final result becomes unpredictable. Without
side-effects, the statements can be executed in any order, which
means it is possible, in principle, to process the parts of a
stylesheet selectively and independently.
Whether XSLT has actually achieved these goals is somewhat
debatable. Certainly, determining which parts of the output document
are affected by a small change to one part of the input document is
not easy, given the flexibility of the expressions and patterns that
are now permitted in the language. Equally, all existing XSLT
processors require the whole document to be loaded into memory.
However, it would be a mistake to expect too much too soon. When E.
F. Codd published the relational calculus in 1970, he made the claim
that a declarative language was desirable because it was possible to
optimize it, which was not possible with the navigational data access
languages in use at the time. In fact it took another fifteen years
before relational optimization techniques (and, to be fair, the price
of hardware) reached the point where large relational databases were
commercially viable. But in the end he was proved right, and the hope
is that the same principle will also eventually deliver similar
benefits in the area of transformation and styling languages.
What being side-effect free means in practice is that you cannot
update the value of a variable. This restriction is something you may
find very frustrating at first, and a big price to pay for these
rather remote benefits. But as you get the feel of the language and
learn to think about using it the way it was designed to be used,
rather than the way you are familiar with from other languages, you
will find you stop thinking about this as a restriction. In fact, one
of the benefits is that it eliminates a whole class of bugs from your
code! I shall come back to this subject in Chapter 8, where I outline
some of the common design patterns for XSLT stylesheets, and in
particular, describe how to use recursive code to handle situations
where in the past you would probably have used updateable variables
to keep track of the current state.
Rule-based
The dominant feature of a typical XSLT stylesheet is that it
consists of a sequence of template rules, each of which describes how
a particular element type or other construct should be processed. The
rules are not arranged in any particular order; they don't have to
match the order of the input or the order of the output, and in fact
there are very few clues as to what ordering or nesting of elements
the stylesheet author expects to encounter in the source document. It
is this that makes XSLT a declarative language: you say what output
should be produced when particular patterns occur in the input, as
distinct from a procedural program where you have to say what tasks
to perform in what order.
This rule-based structure is very like CSS, but with the major
difference that both the patterns (the description of which nodes a
rule applies to) and the actions (the description of what happens
when the rule is matched) are much richer in functionality.
Example: Displaying a Poem
Let's see how we can use the rule-based approach to format a
poem. Again, we haven't introduced all the concepts yet, so I
won't try to explain every detail of how this works, but it's
useful to see what the template rules actually look like in
practice.
Input
Let's take this XML source as our poem.
<poem>
<author>Rupert Brooke</author>
<date>1912</date>
<title>Song</title>
<stanza>
<line>And suddenly the
wind comes soft,</line>
<line>And Spring is
here again;</line>
<line>And the hawthorn
quickens with buds of green</line>
<line>And my heart with
buds of pain.</line>
</stanza>
<stanza>
<line>My heart all
Winter lay so numb,</line>
<line>The earth so dead
and frore,</line>
<line>That I never
thought the Spring would come again</line>
<line>Or my heart wake
any more.</line>
</stanza>
<stanza>
<line>But Winter's
broken and earth has woken,</line>
<line>And the small
birds cry again;</line>
<line>And the hawthorn
hedge puts forth its buds,</line>
<line>And my heart puts
forth its pain.</line>
</stanza>
</poem>
Output
We'll write a stylesheet such that this document
appears in the browser as shown below:
Stylesheet
It starts with the standard header:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
Now we'll write one template rule for each element type in
the source document. The rule for the <poem> element
creates the skeleton of the HTML output, defining the ordering
of the elements in the output (which doesn't have to be the
same as the input order). The <xsl:value-of> instruction
inserts the value of the selected element at this point in the
output. The <xsl:apply-templates>instructions cause the
selected child elements to be processed, each using its
own template rule.
<xsl:template match="poem">
<html>
<head>
<title><xsl:value-of
select="title"/></title>
</head>
<body>
<xsl:apply-templates select="title"/>
<xsl:apply-templates
select="author"/>
<xsl:apply-templates
select="stanza"/>
<xsl:apply-templates
select="date"/>
</body>
</html>
</xsl:template>
The template rules for the <title>, <author>,
and <date> elements are very simple: they take the
content of the element (denoted by «select="."»),
and surround it within appropriate HTML tags to define its
display style:
<xsl:template match="title">
<div
align="center"><h1><xsl:value-of
select="."/></h1></div>
</xsl:template>
<xsl:template match="author">
<div align="center"><h2>By
<xsl:value-of select="."/></h2></div>
</xsl:template>
<xsl:template match="date">
<p><i><xsl:value-of
select="."/></i></p>
</xsl:template>
The template rule for the <stanza> element puts each
stanza into an HTML paragraph, and then invokes processing of
the lines within the stanza, as defined by the template rule
for lines:
<xsl:template match="stanza">
<p><xsl:apply-templates
select="line"/></p>
</xsl:template>
The rule for <line> elements is a little more complex:
if the position of the line within the stanza is an even
number, it precedes the line with two non-breaking-space
characters ( ). The <xsl:if> instruction tests a
boolean condition, which in this case calls the position()
function to determine the relative position of the current
line. It then outputs the contents of the line, followed by an
empty HTML <br> element to end the line.
<xsl:template match="line">
<xsl:if test="position() mod 2 =
0">  </xsl:if>
<xsl:value-of select="."/><br/>
</xsl:template>
And to finish off, we close the <xsl:stylesheet>
element:
</xsl:stylesheet>
Although template rules are a characteristic feature of the XSLT
language, we'll see that this is not the only way of writing a
stylesheet. In Chapter 8, I will describe four different design
patterns for XSLT stylesheets, only one of which makes extensive use
of template rules. In fact, the Hello World stylesheet I
presented earlier in this chapter doesn't make any real use of
template rules: it fits into the design pattern I call
fill-in-the-blanks, because the stylesheet essentially contains
the fixed part of the output with embedded instructions saying where
to get the data to put in the variable parts.
©1999 Wrox Press Limited,
US and UK.
|