|
XSLT is published by the World Wide Web Consortium (W3C) and fits
into the XML family of standards, most of which are also developed by
W3C. In this section I will try to explain the sometimes-confusing
relationship of XSLT to other related standards and
specifications.
XSLT and XSL
XSLT started life as part of a bigger language called XSL
(eXtensible Stylesheet Language). As the name implies, XSL was (and
is) intended to define the formatting and presentation of XML
documents for display on screen, on paper, or in the spoken word. As
the development of XSL proceeded, it became clear that this was
usually a two-stage process: first a structural transformation, in
which elements are selected, grouped and reordered, and then a
formatting process in which the resulting elements are rendered as
ink on paper, or pixels on the screen. It was recognized that these
two stages were quite independent, so XSL was split into two parts,
XSLT for defining transformations, and "the rest" - which is still
officially called XSL, though some people prefer to call it XSL-FO
(XSL Formatting Objects) - the formatting stage.
XSL Formatting is nothing more than another XML vocabulary,
in which the objects described are areas of the printed page and
their properties. Since this is just another XML vocabulary, XSLT
needs no special capabilities to generate this as its output. XSL
Formatting is outside the scope of this book. It's a big subject (the
draft specification currently available is far longer than XSLT), the
standard is not yet stable, and the only products that implement it
are at a very early stage of development. What's more, you're far
less likely to need it than to need XSLT. XSL Formatting provides
wonderful facilities to achieve high-quality typographical output of
your documents. However, for most people translating them into HTML
for presentation by a standard browser is quite good enough, and that
can be achieved using XSLT alone, or if necessary, by using XSLT in
conjunction with Cascading Style Sheets (CSS or CSS2),
which I shall return to shortly.
The XSL Formatting specifications, which at the time of writing
are still evolving, can be found at http://www.w3.org/TR/xsl.
XSLT and XPath
Halfway through the development of XSLT, it was recognized that
there was a significant overlap between the expression syntax in XSLT
for selecting parts of a document, and the XPointer language being
developed for linking from one document to another. To avoid having
two separate but overlapping expression languages, the two committees
decided to join forces and define a single language, XPath, which
would serve both purposes. XPath version 1.0 was published on the
same day as XSLT, 16 November 1999.
XPath acts as a sublanguage within an XSLT stylesheet. An XPath
expression may be used for numerical calculations or string
manipulations, or for testing Boolean conditions, but its most
characteristic use (and the one that gives it its name) is to
identify parts of the input document to be processed. For example,
the following instruction outputs the average price of all the books
in the input document:
<xsl:value-of select="sum(//book/@price) div
count(//book)"/>
Here the <xsl:value-of> element is an instruction defined in
the XSLT standard, which causes a value to be written to the output
document. The select attribute contains an XPath expression, which
calculates the value to be written: specifically, the total of the
price attributes on all the <book> elements, divided by the
number of <book> elements.
The separation of XPath from XSLT works reasonably well, but there
are places where the split seems awkward, and there are many cases
where it's difficult to know which document to read to find the
answer to a particular question. For example, an XPath expression can
contain a reference to a variable, but creating the variable and
giving it an initial value is the job of XSLT. Another example: XPath
expressions can call functions, and there is a range of standard
functions defined. Those whose effect is completely freestanding,
such as string-length(), are defined in the XPath specification,
whereas additional functions whose behavior relies on XSLT
definitions, such as key(), are defined in the XSLT
specification.
Because the split is awkward, I've written this book as if
XSLT+XPath were a single language. For example, all the standard
functions are described together in Chapter 7. In the reference
sections, I've tried to indicate where each function or other
construct is defined in the original standards, but the working
assumption is that you are using both languages together and you
don't need to know where one stops and the other one takes over. The
only downside of this approach is that if you want to use XPath on
its own, for example to define document hyperlinks, then the book
isn't really structured to help you.
XSLT and Internet Explorer 5
Very soon after the first draft proposals for XSL were published,
back in 1998, Microsoft shipped a partial implementation as a
technology preview for use with IE4. This was subsequently replaced
with a rather different implementation when IE5 came out. This
second implementation, known as MSXSL, remained in the field
essentially unchanged until very recently, and is what many people
mean when they refer to XSL. Unfortunately, though, Microsoft jumped
the gun, and the XSLT standard changed and grew, so that when the
XSLT Recommendation version 1.0 was finally published on 16 November
1999, it bore very little resemblance to the initial Microsoft
product.
A Recommendation is the most definitive of documents produced by
the W3C. It's not technically a standard, because standards can
only be published by government-approved standards organizations.
But I will often refer to it loosely as "the standard" in this
book.
Many of the differences, such as changes of keywords, are very
superficial but some run much deeper: for example, changes in the way
the equals operator is defined.
So the Microsoft IE5 dialect of XSL is also outside the scope of
this book. Please don't assume that anything in this book is relevant
to the original Microsoft XSL: even where the syntax appears similar
to XSLT, the meaning of the construct may be completely
different.
You can find information about the original IE5 dialect of XSL in
the Wrox book XML IE5 Programmer's Reference, ISBN 1-861001-57-6.
Microsoft has fully backed the development of the new XSLT
standard, and on 26 January 2000 they released their first attempt at
implementing it. It's a partial implementation, packaged as part of a
set of XML tools called MSXML, but enough to run quite a few of the
examples in this book - and the parts they have implemented conform
quite closely to the XSLT specifications. A further update to this
product (MSXML3) was released on 15 March 2000, bringing the language
even closer to the standard. They've announced that they intend to
move quickly towards a full implementation, so by the time you read
this, the Microsoft product may comply fully with the W3C standard:
check their web site for the latest details.
Microsoft has also released a
converter to upgrade stylesheets from the old XSL dialect to the
new. However, this isn't the end of the story, because, of course,
there are millions of copies of IE5 installed that only support the
old version. If you want to develop a web site that delivers XML to
the browser and relies on the browser interpreting its XSLT
stylesheet, you've currently got your work cut out to make sure all
your users can handle it.
If you are using Microsoft technology on the server, there
is an ISAPI extension called XSLISAPI that allows you to do the
transformation in the browser where it's supported, and on the server
otherwise. Until the browser situation stabilises, however,
server-side transformation of XML to HTML, driven from ASP
pages or from Java servlets, is really the only practical option for
a serious project.
There's more information about products from Microsoft and other
vendors in Chapter 10 - but do be aware that it will become out of
date very rapidly.
XSLT and XML
XSLT is essentially a tool for transforming XML documents. At the
start of this chapter we discussed the reasons why this is important,
but now we need to look a little more precisely at the relationship
between the two. There are two particular aspects of XML that XSLT
interacts with very closely: one is XML Namespaces; the other is the
XML Information Set. These are discussed in the following
sections.
XML Namespaces25
XSLT is designed on the basis that XML namespaces are an essential
part of the XML standard. So when the XSLT standard refers to an XML
document, it really means an XML document that also conforms to
the XML Namespaces specification, which can be found at
http://www.w3.org/TR/REC-xml-names.
For a full explanation of XML Namespaces, see Chapter 7 of the
Wrox Press book Professional XML, ISBN 1-861003-11-0.
Namespaces play an important role in XSLT. Their purpose is to
allow you to mix tags from two different vocabularies in the same XML
document. For example, in one vocabulary <table> might mean a
two-dimensional array of data values, while in another vocabulary
<table> refers to a piece of furniture. Here's a quick reminder
of how they work:
Namespaces are identified by a Unique Resource Identifier
(URI). This can take a number of forms. One form is the
familiar URL, for example http://www.wrox.com/namespace.
Another form, not fully standardized but being used in some XML
vocabularies (see for example http://www.biztalk.org) is a URN, for
example urn:java:com.icl.saxon. The detailed form of the URI doesn't
matter, but it is a good idea to choose one that will be unique. One
good way of achieving this is to use the URL of your own web site.
But don't let this confuse you into thinking that there must be
something on the web site for the URL to point to. The namespace URI
is simply a string that you have chosen to be different from other
people's namespace URIs: it doesn't need to point to anything.
Since namespace URIs are often rather long and use special
characters such as «/», they are not used in full as part
of the element and attribute names. Instead, each namespace used in a
document can be given a short nickname, and this nickname is used as
a prefix of the element and attribute names. It doesn't matter what
prefix you choose, because the real name of the element or attribute
is determined only by its namespace URI and its local name (the part
of the name after the prefix). For example, all my examples use the
prefix xsl to refer to the namespace URI
http://www.w3.org/1999/XSL/Transform, but you could equally well use
the prefix xslt, so long as you use it consistently.
For element names, you can also declare a default namespace URI,
which is to be associated with unprefixed element names. The default
namespace URI, however, does not apply to unprefixed attribute
names.
A namespace prefix is declared using a special pseudo-attribute
within any element tag, with the form:
xmlns:prefix = "namespace-URI"
This declares a namespace prefix, which can be used for the name
of that element, for its attributes, and for any element or attribute
name contained in that element. The default namespace, which is used
for elements having no prefix (but not for attributes), is similarly
declared using a pseudo-attribute:
xmlns = "namespace-URI"
XSLT can't be used to process an XML document unless it conforms
to the XML Namespaces recommendation. In practice this isn't a big
problem, because most people are treating XML Namespaces as if it
were an inherent part of the XML standard, rather than a bolt-on
optional extra. It does have certain implications, though. In
particular, serious use of Namespaces is virtually incompatible with
serious use of Document Type Definitions, because DTDs don't
recognize the special significance of prefixes in element
names; so a consequence of backing Namespaces is that XSLT provides
very little support for DTDs, choosing instead to wait until the
replacement facility, XML Schemas, eventually emerges.
The XML Information Set
XSLT is designed to work on the information carried by an
XML document, not on the raw document itself. This means that, as an
XSLT programmer, you are given a tree view of the source document in
which some aspects are visible and others are not. For example, you
can see the attribute names and values, but you can't see whether the
attribute was written in single or double quotes, you can't see what
order the attributes were in, and you can't tell whether or not they
were written on the same line.
One messy detail is that there have been many attempts to define
exactly what constitutes the essential information content of a
well-formed XML document, as distinct from its accidental
punctuation. All attempts so far have come up with slightly different
answers. The most recent, and the most definitive, attempt to provide
a common vocabulary for the content of XML documents is the XML
Information Set definition, which may be found at
http://www.w3.org/TR/xml-infoset.
Unfortunately this came too late to make all the standards
consistent. For example, some treat comments as significant, others
not; some treat the choice of namespace prefixes as significant,
others take them as irrelevant. I shall describe in Chapter 2 exactly
how XSLT (or more accurately, XPath) defines the Tree Model of XML,
and how it differs in finer points of detail from some of the other
definitions such as the Document Object Model or DOM.
XSL and CSS
Why are there two stylesheet languages, XSL (i.e. XSLT plus XSL
Formatting Objects) as well as Cascading Style Sheets (CSS and
CSS2)?
It's only fair to say that in an ideal world there would be a
single language in this role, and that the reason there are two is
that no-one has been able to invent something that achieved the
simplicity and economy of CSS for doing simple things, combined with
the power of XSL for doing more complex things.
CSS (by which I include CSS2, which greatly extends the degree to
which you can control the final appearance of the page) is mainly
used for rendering HTML, but it can also be used for rendering XML
directly, by defining the display characteristics of each XML
element. However, it has serious limitations. It cannot reorder the
elements in the source document, it cannot add text or images, it
cannot decide which elements should be displayed and which omitted,
it cannot calculate totals or averages or sequence numbers. In other
words, it can only be used when the structure of the source document
is already very close to the final display form.
Having said this, CSS is simple to write, and it is very
economical in machine resources. It doesn't reorder the document, so
it doesn't need to build a tree representation of the document in
memory, and it can start displaying the document as soon as the first
text is received over the network. Perhaps most important of all, CSS
is very simple for HTML authors to write, without any programming
skills. In comparison, XSLT is far more powerful, but it also
consumes a lot more memory and processor power, as well as training
budget.
It's often appropriate to use both tools together. Use XSLT to
create a representation of the document that is close to its
final form, in that it contains the right text in the right order,
and then use CSS to add the finishing touches, by selecting font
sizes, colors, and so on. Typically you would do the XSLT processing
on the server, and the CSS processing on the client (in the browser),
so another advantage of this approach is that you reduce the amount
of data sent down the line, which should improve response time for
your users as well as postponing the next expensive bandwidth
increase.
©1999 Wrox Press Limited,
US and UK.
|