|
In this final section of this chapter I shall try and identify
what tasks XSLT is good at, and by implication, tasks for which a
different tool would be more suitable. I shall also look at
alternative ways of using XSLT within the overall architecture of
your application.
Broadly speaking, as I discussed at the beginning of the chapter,
there are two main scenarios for using XSLT transformations: data
conversion, and publishing; and we'll consider each of them
separately.
Data Conversion Applications
Data conversion is not something that will go away just because
XML has been invented. Even though an increasing number of data
transfers between organizations or between applications within an
organization are likely to be encoded in XML, there will still be
different data models, different ways of representing the same thing,
and different subsets of information that are of interest to
different people (recall the example at the beginning of the chapter,
where we were converting music between different XML representations
and different presentation formats). So however enthusiastic we are
about XML, the reality is that there are going to be a lot of
comma-separated-values files, EDI messages, and any number of other
formats in use for a long time to come.
When you have the task of converting one XML data set into another
XML data set, then XSLT is an obvious choice.
It can be used for extracting the data selectively, reordering it,
turning attributes into elements or vice versa, or any number of
similar tasks. It can also be used simply for validating the data. As
a language, XSLT is best at manipulating the structure of the
information as distinct from its content: it's a good language for
turning rows into columns, but for string handling (for example
removing any text that appears between square brackets) it's rather
laborious compared with a language like Perl. However, you can always
tackle these problems by invoking procedures written in other
languages, such as Java or Javascript, from within the
stylesheet.
XSLT is also useful for converting XML data into any text-based
format, such as comma-separated values, or various EDI message
formats. Text output is really just like XML output without the tags,
so this creates no particular problems for the language.
Perhaps more surprising is that XSLT can often be useful to
convert from non-XML formats into XML or something else:
In this case you'll need to write some kind of parser that
understands the input format; but you would have had to do that
anyway. The benefit is that once you've written the parser, the rest
of the data conversion can be expressed in a high-level language.
This separation also increases the chances that you'll be able to
reuse your parser next time you need to handle that particular input
format. I'll show you an example in Chapter 9, page 610, where the
input is a rather old-fashioned and distinctly non-XML format widely
used for exchanging data between genealogy software packages. It
turns out that it isn't even necessary to write the data out as XML
before using the XSLT stylesheet to process it: all you need to do is
to make your parser look like an XML parser, by making it implement
one of the standard parser interfaces: SAX or DOM. Most XSLT
processors will accept input from a program that implements the SAX
or DOM interfaces, even if the data never saw the light of day as
XML.
One caveat about data conversion applications: today's XSLT
processors all rely on holding all the data in memory while the
transformation is taking place. The tree structure in memory can be
as much as ten times the original data size, so in practice, the
limit on data size for an XSLT conversion is a few megabytes. Even at
this size, a complex conversion can be quite time-consuming: it
depends very much on the processing that you actually want to do.
One way around this is to split the data into chunks and convert
each chunk separately - assuming, of course, that there is some kind
of correspondence between chunks of input and chunks of output. But
when this starts to get complicated, there comes a point where XSLT
is no longer the best tool for the job. You might be better off, for
example, loading the data into a relational or object database, and
using the database query language to extract it again in a different
sequence.
If you need to process large amounts of data serially, for example
extracting selected records from a log of retail transactions, then
an application written using the SAX interface might take a little
longer to write than the equivalent XSLT stylesheet, but it is likely
to run many times faster. Very often the combination of a SAX filter
application to do simple data extraction, followed by an XSLT
stylesheet to do more complex manipulation, can be the best solution
in such cases.
Publishing
The difference between data conversion and publishing is that in
the former case, the data is destined for input to another piece of
software, while in the latter case it is destined to be read (you
hope) by human beings. Publishing in this context doesn't just mean
lavish text and multimedia, it also means data: everything from the
traditional activity of producing and distributing reports so that
managers know what's going on in the business, to producing online
phone bills and bank statements for customers, and rail timetables
for the general public. XML is ideal for such data publishing
applications, as well as the more traditional text publishing, which
was the original home territory of SGML.
XML was designed to enable information to be held independently of
the way it is presented, which sometimes leads people into the
fallacy of thinking that using XML for presentation details is
somehow bad. Far from it: if you were designing a new format for
downloading fonts to a printer today, you would probably make it
XML-based. Presentation details have just as much right to be encoded
in XML as any other kind of information. So we can see the role of
XSLT in the publishing process as being converting
data-without-presentation to data-with-presentation, where both are,
at least in principle, XML formats.
The two important vehicles for publishing information today are
print-on-paper, and the web. The print-on-paper scene is the more
difficult one, because of the high expectations of users for visual
quality. XSL Formatting Objects attempts to define an XML-based model
of a print file for high quality display on paper or on screen.
Because of the sheer number of parameters needed to achieve this, the
standard is taking a while to complete, and will probably take even
longer to implement. But the web is a less demanding environment,
where all we need to do is convert the data to HTML and leave the
browser to do the best it can on the display available. HTML, of
course, is not XML, but it is close enough so that a simple mapping
is possible. Converting XML to HTML is the most common application
for XSLT today. It's actually a two-stage process: first convert to
an XML-based model that is structurally equivalent to the target
HTML, and then serialize this in HTML notation rather than strict
XML.
The emergence of XHTML 1.0 of course tidies up this process even
further, because it is a pure XML format, but how quick the take-up
of XHTML will be remains to be seen.
When to do the Conversion?
There are several points in such a system where XSLT
transformations might be appropriate:
Information entered by authors using their preferred tools, or
customized form-filling interfaces, can be converted to XML and
stored in that form in the content store.
XML information arriving from other systems might be transformed
into a different flavor of XML for storage in the content store. For
example, it might be broken up into page-size chunks.
XML can be translated into HTML on the server, when the users
request a page. This can be controlled using technology such as Java
servlets or Java Server Pages. On a Microsoft server you can use the
XSL ISAPI extension available from http://msdn.microsoft.com/xml, or
if you want more application control, you can invoke the
transformation from script on ASP pages.
XML can be sent down to the client system, and translated into
HTML within the browser. This can give a highly interactive
presentation of the information, but it relies on all the users
having a browser that can do the job.
XML data can also be converted into its final display form at
publishing time, and stored as HTML within the content store. This
minimizes the work that needs to be done at display time, and is
ideal when the same displayed page is presented to very many
users.
There isn't one right answer, and often a combination of
techniques may be appropriate. Conversion in the browser is an
attractive option once XSLT becomes widely available within browsers,
but that is still some way off. Even when this is done, there may
still be a need for some server-side processing to deliver the XML in
manageable chunks, and to protect secure information. Conversion at
delivery time on the server is a popular choice, because it allows
personalization, but it can be a heavy overhead for sites with high
traffic. Some busy sites have found that it is more effective to
generate a different set of HTML pages for each section of the target
audience in advance, and at page request time, to do nothing more
than selecting the right pre-constructed HTML page.
©1999 Wrox Press Limited,
US and UK.
|