Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
This is an excerpt from
Chapter 4 of the New Riders
book called
Inside XSLT written by Steven Holzner.
PLEASE NOTE: To work with the XSLT and XPath in this
document TopXML recommends you use the demo version of Xselerator
XSL Editor
You can find the definition of match patterns in the W3C XSLT
Recommendation. Match patterns are defined in terms of XPath
expressions this way:
"The syntax for patterns is a subset of the syntax for [XPath]
expressions. In particular, location paths that meet certain
restrictions can be used as patterns. An expression that is also a
pattern always evaluates to an object of type node-set. A node
matches a pattern if the node is a member of the result of
evaluating the pattern as an expression with respect to some
possible context; the possible contexts are those whose context
node is the node being matched or one of its ancestors."
The most important sentence in the preceding paragraph is the
last one. The idea is that a node X matches a pattern if and only
if there is a node that is either X or an ancestor of X such that
when you apply the pattern as an XPath expression to that node, the
resulting node set includes X.
So what does that actually mean? It means that if you
want to see whether a pattern matches a node, first apply it to the
node itself as an XPath expression, then apply it to all its
ancestor nodes in succession, back to the root node. If any node
set that results from doing this includes the node itself, the node
matches the pattern. Working this way makes sense, because match
patterns are written to apply to the current node or children of
the current node.
Defining patterns in terms of XPath expressions this way is
relatively straightforward, but now and again there are
consequences that aren't obvious at first. For example, although
the node() function is defined to match any node, when you use it
as a pattern, "node()", it's really an abbreviation for
"child::node()", as you'll see later in this chapter. Among other
things, that means that the pattern "node()" can match only child
nodes it will never match the root node. You should also note that
there are no patterns that can match namespace-declaration
nodes.
The W3C gives the formal definition of match patterns using
Extended Backus-Naur Form (EBNF) notation, which is the same
notation that the XML specification is written in. You can find an
explanation of this grammar at www.w3.org/TR/REC-xml, Section 6. I
include the formal definition for patterns here only for the sake
of reference. (This whole chapter is devoted to unraveling what
this formal definition says and making it clear.) The following
list includes the EBNF notations used here:
::= means "is
defined as"
+ means "one
or more"
means "zero or
more"
| means
"or"
- means
"not"
? means
"optional"
The following is the actual, formal W3C definition of match
patterns; when an item is quoted with single quotation marks, such
as 'child' or '::', that item is meant to appear in the pattern
literally (such as "child::NAME"), as are items called
Literals:
The definitions for NodeText and Predicate come
from the XPath specification, as follows (Expr stands for an XPath
expression, and NCName and QName were defined at the beginning of
Chapter 2):
As you can see, this is all more or less as clear as mud. Now
it's time to start deciphering. First, a pattern consists of
one or more location path patterns. A location path pattern,
in turn, consists of one or more step patterns, separated by
/ or //, or one or more step patterns in conjunction with the id or
key functions (which match elements that have specific IDs or
keys).
Step patterns are the building blocks of patterns, and
you can use multiple steps in a single path, separating them by /
or //, as in the pattern "PLANET/*/NAME", which has three steps:
"PLANET", "*", and "NAME". If you start the pattern itself with /,
it's called absolute, because you're specifying the pattern
from the root node (like "/PLANETS/PLANET" or "//PLANET");
otherwise, it's called relative, and it's applied starting
with the context node (like "PLANET").
Next, a step pattern is made up of an axis, a node
test, and zero or more predicates. For example, in the
expression child::PLANET[position() = 5], child is the name of the
axis, PLANET is the node test, and [position() = 5] is a predicate.
(Predicates are always enclosed in [ and ].) You can create
patterns with one or more step patterns, such as
/child::PLANET/child::NAME, which matches <NAME> elements
that are children of a <PLANET> parent.
To understand patterns, then, you have to understand step
patterns, because patterns are made up of one or more step patterns
in expressions such as
"step-pattern1/step-pattern2/step-pattern3...". And to understand
step patterns, you have to understand their three parts axes, node
tests, and predicates which I'll take a look at in the following
sections.
Axes make up the first part of step patterns. For example, in
the step pattern child::NAME, which refers to a <NAME>
element that is a child of the context node, child is called the
axis. Patterns support two axes (XPath, on the other hand, supports
no less than 13 different axes see Chapter 7):
The attribute axis
holds the attributes of the context node.
The child axis holds
the children of the context node. The child axis is the default
axis if one is not explicitly set.
You can use axes to specify a location step or path as in the
following example, where I use the child axis to indicate that I
want to match the child nodes of the context node, which is a
<PLANET> element:
<xsl:template match="PLANET">
<HTML>
<CENTER>
<xsl:value-of select="child::NAME"/>
</CENTER>
<CENTER>
<xsl:value-of select="child::MASS"/>
</CENTER>
<CENTER>
<xsl:value-of select="child::DAY"/>
</CENTER>
</HTML>
</xsl:template>
Look at the following examples that use axes:
child::PLANET.
Returns the <PLANET> element children of the context
node.
child::*. Returns
all element children (*matches only elements) of the context
node.
attribute::UNIT.
Returns the UNITS attribute of the context node.
child::*/child::PLANET. Returns all <PLANET> grandchildren of
the context node.
Although these examples make it seem that you can use only the
child and attribute axes, in practice this is not quite so. When it
comes to specifying children, the child axis is a little limited,
because you must specify every level that you want to match, such
as "child::PLANETS/child::PLANET/child::MASS", which matches a
<MASS> element that is a child of a <PLANET> element
that is a child of the <PLANETS> element. If you want to
match all <MASS> elements that appear anywhere in the
<PLANETS> element, whether they are children, grandchildren,
great-grandchildren, and so on, for example, it looks as if there's
no way to do that in one pattern. In XPath, you can do that with an
expression like "child::PLANETS/descendant::MASS", but you can't
use the descendant axis in patterns. However, remember that you
can use the // operator, which amounts to the same thing. For
example, the pattern "child::PLANETS//child::MASS" matches all
<MASS> elements anywhere inside the <PLANETS> element.
(In fact, this is a minor inconsistency in the specification.)
The next example shows how I might put this pattern to work,
replacing the text in all <MASS> elements, no matter where
they are inside the <PLANETS> element, with the text "Very
heavy!". To copy over all the other nodes in planets.xml to the XML
result document, I also set up a rule that matches any node using
the node node test, which you'll see later. Note that although the
pattern that matches any node also matches all the <MASS>
elements, the "child::PLANETS//child::MASS" pattern is a much more
specific match, which (as discussed in Chapter 3) means the XSLT
processor gives it higher priority for <MASS> elements:
You can also take advantage of a number of abbreviations when
specifying axes in patterns, and these abbreviations are almost
invariably used when you're specifying axes in patterns.
Abbreviated Syntax
There are two rules for abbreviating axes in patterns:
child::childname can be abbreviated as
childname.
attribute::childname can be abbreviated as
@childname.
The following list includes some examples of patterns using
abbreviated syntax you'll see a lot more at the end of the
chapter:
PLANET. Matches the
<PLANET> element children of the context node.
*. Matches all
element children of the context node.
@UNITS. Matches the
UNITS attribute of the context node.
@*. Matches all the
attributes of the context node.
*/PLANET. Matches
all <PLANET> grandchildren of the context node.
//PLANET. Matches
all the <PLANET> descendants of the document root.
PLANETS//PLANET.
Matches all <PLANET> element descendants of the
<PLANETS> element children of the context node.
//PLANET/NAME.
Matches all the <NAME> elements that are children of a
<PLANET> parent.
PLANET[NAME].
Matches the <PLANET> children of the context node that have
<NAME> children.
In a pattern such as "child::PLANET", "child" is the axis and
"PLANET" is the node test, which is the second part of step
patterns.
The second part of a step pattern is made up of node tests. You
can use names of nodes as node tests, or the wild card * to select
element nodes as well as node types. For example, the expression
child::*/child::NAME selects all <NAME> elements that are
grandchildren of the context node.
In addition to node names and the wild card character, you can
also use the following node tests:
The comment() node
test selects comment nodes.
The node() node test
selects any type of node.
The
processing-instruction() node test selects a processing instruction
node. You can specify the name of the processing instruction to
select in the parentheses.
The text() node test
selects a text node.
The following sections examine these node tests and provide
examples to help you understand how they're used.
You can match the text of comments with the pattern comment().
You shouldn't store data that should go into the output document in
comments in the input document, of course. However, you might want
to convert comments from the <!--comment--> form into
something another markup language might use, such as a
<COMMENT> element.
In the following example, I will extract comments from
planet.xml and include them in the resulting output.
Here's the result for Venus, where I've transformed the comment
into a <COMMENT> element:
Venus
.815
116.75
3716
.943
66.8<COMMENT>At
perihelion</COMMENT>
Note that here the text for the other elements in the
<PLANET> element is also inserted into the output document,
because the default rule for each element is to include its text in
the output document. Because I haven't provided a rule for
elements, their text is simply included in the output document.
In a pattern, the node node test matches any node except the
root node remember, it is really child::node().Say that you want to
use <xsl:copy> to write a stylesheet that copies any XML
document. (Chapter 3 used <xsl:copy-of> for this purpose.) I
might start off as the following example shows. In this case, the
template I'm using uses the OR operator, which you'll see later in
this chapter, to match any element or any attribute (this template
actually selects itself to keep on copying many levels deep):
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<xsl:template match="@*|*">
<xsl:copy>
<xsl:apply-templates select="@*|*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
However, here's the result notice that this version, which
matches only elements and attributes (@*|*), doesn't copy
whitespace nodes or text nodes:
<?xml version="1.0" encoding="UTF-8"?>
<PLANETS><PLANET><NAME/><MASS UNITS="(Earth
= 1)"/><DAY UNITS="days"/><RADIUS UN
This is clearly incomplete. If I match to the pattern
"@*|node()" rather than "@*|*", on the other hand, the new template
rule will match all nodes except the root node (which is created in
the result tree automatically), so it copies whitespace as well as
text:
You can match the text in a node with the pattern "text()".
There's usually not much reason to ever use the text node test.
XSLT includes a default rule that if no other rules match the text
node, the text in that node is inserted into the output document.
If you were to make that default rule explicit, it might look like
this:
<xsl:template match="text()">
<xsl:value-of select="."/>
</xsl:template>
You can override this rule by not sending the text in text nodes
to the output document, like this:
<xsl:template match="text()">
</xsl:template>
One reason to use the text node test is when you want to match
nodes with specific text. Used inside the predicate, as in
"NAME[text() = 'Venus']", <NAME> elements where the enclosed
name is "Venus" are matched. (Note that you have to be careful
about nesting quotation marks so the XSLT processor won't get
confused; for example, this won't work: "NAME[text() = "Venus"]".)
Another reason to use the text node test is when you want to apply
some test to text nodes using the XPath string functions (which
you'll see later in this chapter). For example, later in this
chapter, I'll match the text node "Earth" in
<NAME>Earth</NAME> with the pattern
"text()[starts-with(., 'E')]".
Earlier, you saw that the pattern "@*|node()" (which uses the OR
operator, |, as you'll see discussed in a little later) matches
everything in planets.xml, including comments. If you want to strip
out the comments, you can copy matching to a pattern such as
"@*|*|text()", which preserves only elements, attributes, and text
nodes.
You can use the pattern processing-instruction() to match
processing instructions:
<xsl:template match="/processing-instruction()">
***Marked code
<I>
Found a processing
instruction.
</I>
</xsl:template>
You can also specify what processing instruction you want to
match by giving the name of the processing instruction (excluding
the <? and ?>) as in the following case, where I match the
processing instruction <?xml-include?>:
One of the major reasons that there is a distinction between the
root node at the very beginning of the document and the root
element is so you have access to the processing instructions and
other nodes in the document's prologue.
That takes care of the node tests that are possible in step
patterns. The third and last part of step patterns is
predicates.
Predicates, the third part of step patterns, contain XPath
expressions. You can use the [] operator to enclose a predicate and
test whether a certain condition is true.
For example, you can test
The value of an attribute in a given string.
The value of an element.
Whether an element encloses a particular child, attribute, or
other element.
The position of a node in the node tree.
You'll work with XPath expressions in more detail in Chapter 7,
but we'll get an introduction to them here because you can use them
in pattern predicates.
XPath expressions are more involved than match patterns. If you
run into trouble creating them, one thing that's good to know is
that the Xalan package has a handy example program,
ApplyXPath.java, that enables you to apply an XPath expression to a
document and see what the results would be. For example, if I apply
the XPath expression "PLANET/NAME" to planets.xml, the following
example shows what the result looks like, displaying the values of
all <NAME> elements that are children of <PLANET>
elements (the opening and closing <output> tag is added by
ApplyXPath):
%java ApplyXPath planets.xml PLANET/NAME
<output>
<NAME>Mercury</NAME>
<NAME>Venus</NAME>
<NAME>Earth</NAME>
</output>
If the value of a predicate is numeric, it represents a
position test. For example, NAME[1] matches the first
<NAME> child of the context node. W3C position tests, and
position tests in Xalan, Oracle, XT, Saxon, and MSXML3 (the
Microsoft XML processor invoked using JavaScript, which you saw in
Chapter 1 and will see more on in Chapter 10) are 1-based, so the
first child is child 1. However, position tests in XML documents
that use XSL stylesheets and are loaded into the current version of
the Internet Explorer (version 5.5) are 0-based (and you can use
only a very restricted form of XPath expressions in predicates)and
so, in consequence, is much of the XSL documentation on the
Microsoft site. Otherwise, the value of a predicate must be true or
false, called a boolean test. For example, the predicate
[@UNITS = "million miles"] matches elements that have UNITS
attributes with the value "million miles".
Predicates are full XPath expressions, although predicates used
in patterns have two restrictions:
When a pattern is used in a match attribute, the predicate must
not contain any reference to XSL variables (which you'll see in
Chapter 9). This restriction does not apply to predicates used in
<xsl:number> elements.
Patterns may not use the XPath current function in predicates.
This function returns the current node, and its use is restricted
so processing is implementation-independent and does not depend on
the current processing state.
The pattern in the following example matches <PLANET>
elements that have child <NAME> elements:
<xsl:template match =
"PLANET[NAME]">
.
.
.
</xsl:template>
This pattern matches any element that has a <NAME>
child element:
<xsl:template match = "*[NAME]">
.
.
.
</xsl:template>
Now I've given the <PLANET> elements in planets.xml a new
attribute, COLOR, which holds the planet's color:
The stylesheet shown in Listing 4.5 filters out all planets
whose color is blue and omits the others by turning off the default
rule for text nodes. Here's the result: