Sample data
Sample XML data that both examples utilize is as follows. It is exactly same as in the previous article.
<?xml version="1.0" encoding="utf-8" ?>
<cars>
<car manufacturer="Mazda" owner="Jason" />
<car manufacturer="Opel" owner="Teemu" />
<car manufacturer="Toyota" owner="Thomas" />
<car manufacturer="Opel" owner="Henry" />
<car manufacturer="Daimler" owner="Mika" />
<car manufacturer="Toyota" owner="Colt" />
<car manufacturer="Toyota" owner="Teemu" />
<car manufacturer="Daimler" owner="Jason" />
</cars>
[
Download sample data]
Selecting distinct nodes more effectively
If nodes to be selected are in sorted order, we could use following XPath expression to get distinct nodes.
//car/car[not(@manufacturer=preceding-sibling::car[1]/@manufacturer)]/@manufacturer
In other words, we don’t have to check all preceding sibling nodes, if we know that nodes are in sorted order. In this case we need to check only the immediate preceding sibling. This is usually more effective way than checking all preceding siblings, although performance boost is dependant on processor implementation.
So as a result one way to utilize this approach is to first sort the nodes before selection. Example related to this demonstrates also usage of node-set() extension function so that we are able to use temporary tree as node-set in XSLT 1.0. To remind the reader, XSLT processor implemented in .NET Framework 1.0 supports XSLT version 1.0.
[
Download XSLT stylesheet]
Grouping by manufacturers utilizing Muenchian technique
Muenchian grouping is technique that utilizes XSLT keys and XPath generate-id() function to do the grouping. First we generate the key in XSLT stylesheet:
<xsl:key name="car-key" match="car" use="@manufacturer" />
As a result, call to XPath function:
key('car-key','Toyota')
would return all car elements that have 'Toyota' as value of @manufacturer attribute. Therefore:
key('car-key','Toyota')[1]
would return the first car that has ‘Toyota’ as manufacturer. And again therefore expression:
generate-id(.)=generate-id(key('car-key',@manufacturer)[1])
would be true only for car that is the first car from this manufacturer. The task of generate-id() function is to generate unique identification string for given node. And because generate-id() function actually uses only the first node in the node-set, predicate [1] would not be needed at all. Therefore unique manufacturers could be selected as follows:
/cars/car[generate-id(.)=generate-id(key('car-key',@manufacturer))]/@manufacturer
Muenchian grouping is usually faster than the approach using preceding-sibling axis because XSLT keys would be implemented utilizing some kind of index or hash table and therefore selecting nodes with same key value would be fast. Technique gives also other solution possibilities to be used with XSLT.
[
Download XSLT stylesheet]
Conclusion
I have demonstrated with couple of more advanced examples how selecting distinct nodes and grouping can happen. You should be able to utilize this in your own stylesheets easily.
Resources
Jeni Tennison's XSLT Pages
XSLT Programmer's Reference 2nd Edition, Wrox Press
Idea for this article by
Kirk Allen Evans, http://www.xmlandasp.net