|
Summary
Explanation of the Muenchian Technique - this snippet shows how to use the generate-id and to find distinct values in your XML.
This snippet orginated from a posting that I replied to on another newsgroup where the user wanted na equivalent of the sql select distinct ..... in his XSLT. To answer this, in this snippet we are experimenting with using the generate-id and element won't explain fully why it's so important as part of the distinct/grouping technique.
Let's take a look first at the just the <xsl:key> on it's own, in it's simplest form. The XML ...
<?xml version=1.0?> <rights> <right> <right_name>Free TV</right_name> <territory>USA</territory> <territory>Australia</territory> </right> <right> <right_name>Pay TV</right_name> <territory>USA</territory> <territory>UK</territory> </right> </rights>
Say we have some XSLT like...
<?xml version=1.0?>
<xsl:stylesheet version=1.0 xmlns:xsl=http://www.w3.org/1999/XSL/Transform> <xsl:key name=territories match=/rights/right/territory use=./> <xsl:template match=/> <rights> <xsl:for-each select=key('territories','USA')> <territory><xsl:value-of select=./></territory> </xsl:for-each> </rights> </xsl:template> </xsl:stylesheet>
You'll see that it creates two <territory> elements in the output. That is because the second parameter of the key() function is matched against the @use attribute of the <xsl:key> from the nodes selected in the @match attribute. In essence what I've done is asked for all the <territory> elements that have a value of 'USA'. A variation of this might be if we wanted all the <territory> elements that begin with the letter 'U', e.g...
<?xml version=1.0?>
<xsl:stylesheet version=1.0 xmlns:xsl=http://www.w3.org/1999/XSL/Transform> <xsl:key name=territories match=/rights/right/territory use=substring(.,1,1)/> <xsl:template match=/> <rights> <xsl:for-each select=key('territories','U')> <territory><xsl:value-of select=./></territory> </xsl:for-each> </rights> </xsl:template> </xsl:stylesheet>
I changed the @use attribute of the <xsl:key> to define what I'm looking for (i.e. just the first letter this time), and changed the second parameter of the key() function to look for just 'U'. So <xsl:key> is a handy shorthand way of defining a node-set that you later want to extract specific matching nodes from (there are other uses of course, and Micael Kay's book gives a better and thorough explanation of the element).
Going back to the distinct problem, let me try to explain why the <xsl:key> is so fundamental to this technique. But first I still need to explain the mechanics of what's going on. Take a look at just the <xsl:for-each> element, i.e.
< xsl:for-each select=/rights/right/territory[count(.|key('distinct-territory',.)[1]) = 1]>
The first part is just selecting all the territory elements we're interested in. Then, inside the predicate [], we're applying a filter of...
count(.|key('distinct-territory',.)[1]) = 1
Which is saying that we're looking for a count() of 1 for some node-set specified within the count() function. So looking just at the node-set selected within the count() function...
.|key('distinct-territory',.)[1]
We're selecting a node-set which is a union | of the current context node (i.e. the current <territory> node as each is tested against the filter) and the first node of node-set returned by the <xsl:key> when the @use value is equal to the value of the current context node (i.e. again, the current <territory> node as each is tested against the filter) - the first node being selected by the virtue of the fact we've specified [1].
The english explanation of what's happening is give me the union of this node AND the first node of all the nodes that match this node. Part of the magic of this is the specific behaviour of the union | operator. When a union is performed it is guaranteed that each node will only occur once in the resultant node-set. In our case what we've essentially said is only select this current node if it is the first occurrence of all the nodes that have the same value. That's why we use the count() function - to tell us that the resultant of the union contains only a single node.
Now, at last, I can show you why <xsl:key> is so vital. OK, <xsl:key> is just a short-hand way of selecting specific nodes so let's try to do away with it - try to come up with a straight XPath expression that doesn't use it by transposing why the key() function is doing with an equivalent expression...
count(.|/rights/right/territory[. = Ooops][1]) = 1
So I've got rid of the key() function and replaced it with an equivalent expression but you see the Ooops - I can't get hold of the context node outside the predicate [] once I'm inside it. (Actually, there is a way but it is horrific and involves scripting and the use of extension functions - it is truly yek!).
Going back to my original solution...
<?xml version=1.0?>
<xsl:stylesheet version=1.0 xmlns:xsl=http://www.w3.org/1999/XSL/Transform> <xsl:key name=distinct-right_name match=/rights/right/right_name use=./> <xsl:key name=distinct-territory match=/rights/right/territory use=./> <xsl:template match=/> <rights> <xsl:for-each select=/rights/right/right_name[generate-id()=generate-id(key('distinct-right_name',.))]> <right_name><xsl:value-of select=./></right_name> </xsl:for-each> <xsl:for-each select=/rights/right/territory[generate-id()=generate-id(key('distinct-territory',.))]> <territory><xsl:value-of select=./></territory> </xsl:for-each> </rights> </xsl:template> </xsl:stylesheet>
This uses the generate-id() function rather than the union | operator to achieve the goal, i.e. only select this current node if it is the first occurence of all the nodes that have the same value. Again, it uses the <xsl:key> element and key() function - for exactly the same reason as I described earlier of trying to get at a context node from outside of a predicate.
What the generate-id() function does is returns a consistent unique id for any given node. Consistent in the fact that no matter how the node is selected it will always have the same generated id (with the caveat that the id is only garaunteed consistent within a given transformation 'session').
Notice on the first use of the generate-id() function in each predicate there is no parameter - the default parameter being the current context node. On the second use of the generate-id() I'm passing a node-set - in which case generate-id() returns the id for the first node of that set. So, in essence, what this predicate [] filter is saying is only select this node if it has the same unique id as the first node of all nodes with the same value as this node.
Both techniques work, albeit using slightly different methods. I have a personal preference for the one I originally gave - only for the fact that it doesn't use nested predicates [].
Don't worry if it still isn't clear - it takes some getting your head around (or my explanation is just plain rubbish! :-) - but if that's the case then my defence is that this is the very first time I've tried to describe it).
Hope this is of some use
Marrow http://www.MarrowSoft.com - home of Xselerator (XSLT IDE and debugger) http://www.TopXML.com/Xselerator
|