BizTalk Utilities CV ,   Jobs ,   Code library
 
Home Page


Add/Edit your code items
Search the code library
Browse for the code library


XSLT
Dynamically change the encoding of your XSLT stylesheet
Identifying an attribute or a namespace node
Build a XPath expression for a node
Avoiding an XSLT Processor crash due to deep recursive processing
Conditional generation of a string in a single XPath expression
Re: Re: ABS function implemented as a single XPath expression
Re: ABS function implemented as a single XPath expression
ABS function implemented as a single XPath expression
XML to HTML text editor
Dynamically Selecting Which Element To Sort On Using Parameters
Avoid NaN in XPath sum() function
JScript Super Class To Handle XML Transformations
Test if a bit is on
Web Methodology (Layout Management)
Grouping an XML Node by First Letter
Dependent ComboBox in HTML build with XSLT
Re: How to use XSL function sum() to sum 2 XML fields
Re: How to use XSL function sum() to sum 2 XML fields
How to use XSL function sum() to sum 2 XML fields
Helpful XSLT ""break"" template


 
 

<< XQuery.NET and XML >>


By Dimitre Novatchev
First Posted 03/10/2001
Times viewed 283

Get the newest 5 articles out of hundreds -- a super efficient sort in XSLT


Summary This snippets implements partial sort -- obtaining the biggest k elements out of N. It uses binary search and a one-pass approach to achieve a very good efficiency -- Log2(k)*N

In my last, Binary Search snippet I mentioned that the algorithm has many
and very important applications, and hinted I was going to use it in sorting.
This is precisely what I'm going to do now.

Here's how our web-administrator Trace Wilson formulated the problem:

Hi Daryl,
I've had the same problem for quite a while.

What I do is first is find all items before a date,
then I sort on my date (see snippet

http://www.vbxml.com/snippetcentral/main.asp?view=viewsnippet&id=v20010125215939
)

There are another 2 there on dates.

From there I need to check if the position() is less
than x, which means that my whole selection gets
iterated through.  If you can think of a nicer
solution, I'd love to hear your ideas.  I asked
earlier on this and Dimitre mentioned that I really
had no choice in the matter.  But he has just supplied
a solution for breaking out of a for-each which I'm
going to try next week
..

What I was offering to Trace then was to stop performing full sort (with complexity
of Log2(N)*N and to find k (k=5 in the real case -- anyway, something much
smaller than N) times the maximum dates instead. The latter has a complexity of
O(N) -- this means that the complexity is some linear function of N -- c*N.

In this case it is obvious that c=k.

While this solution is clearly much better, there's still the possibility that another O(N)
solution might exist with complexity c2*N, where c2 is signifficantly smaller than c.

Interesting, isn't it? :o))

OK, let's analyse the problem.

When finding the maximum element of a node-set we have a single work pad where
to keep the current maximum value. We scan through the set one element at a time
and compare it to the value in the workpad. Whenever the current element is greater,
we produce a new workpad and put the new number there (for reasons of simplicity
I'll speak about numbers, not elements, but we'll know that this is the number,
corresponding to the element).

The algorithm ends when we have performed a full scan of the set of elements.
If the set has N elements, there will be N comparisons.
To perform this algorithm k times will require k*N comparisons.

How can we find the k biggest elements even faster?

Hmmm... if only we could scan the set only once... Wait, that's an interesting idea,
why not?

Imagine that we have a bigger workpad that can hold k numbers. At any step through
the elements of the set this workpad will hold the k biggest numbers up to now.
If the current element is bigger than at least one in the workpad, then we construct
a new workpad, in which the smallest (up to now) number has been dropped and
the current number has been added.

In this way we will perform only one pass over the set of elements and at the end
the workpad will contain the k biggest elements.

Wow... isnt that nice!

Yes, but... does this really have smaller complexity? We scan only N elements
but every element's number has to be compared now with the k numbers in the
workpad -- this still means k*N comparisons... :(((

OK, now we're ready for the really big step forward -- what if we keep the numbers
in the workpad always sorted? Will we then really require k comparisons?

Wait... this reminds me of something... didn't I write another snippet just the day before
and said it could be used for very efficient sorting? Binary Search...?

Here's the idea: We modify slightly the Binary Search  snippet so that it will always
return the position a number must have in the sorted workpad if we want to add it there
and still keep the workpad sorted.

What will be the complexity then?
A binary search requires just Log2(k) comparisons -- hence the total complexity will be
Log2(k)*N.

So, compared to the k-times the maximum this algorithm will be k div Log2(k) times
faster.

In practical terms, if we're getting 5 out of N, this will be slightly more than twice faster,
if we're getting 10 out of N this will be more than 3 times faster, if we're getting 100 of N
(think N is thousands) this will be almost 7 times faster.

Notice also, that if we make the workpad hold N elements, then we have (our own!)
super fast sorting algorithm. The complexity is the same as that of xsl:sort,
but we now have huge flexibility to define and use in the sort our own comparison
operators for any (including our own!) data types. Good result, isnt it?

Is this the fastest sort possible?
Well, I know of at least one search algorithm, which is faster than binary search.
May give it a try... :o))

Now, how we code this in XSLT?

Let's have the following source xml document:


  <dates>
      <datesSorted>
          <date value=19271003/>
          <date value=19321004/>
          <date value=19530323/>
          <date value=19640505/>
          <date value=19920714/>
          <date value=19990417/>
          <date value=20001123/>
          <date value=20010406/>
          <date value=20020417/>
          <date value=20100714/>
      </datesSorted>
      <datesUnSorted>
          <date value=20100714/>
          <date value=19321004/>
          <date value=19920714/>
          <date value=19530323/>
          <date value=19990417/>
          <date value=19271003/>
          <date value=20001123/>
          <date value=19640505/>
          <date value=20020417/>
          <date value=20010406/>
      </datesUnSorted>
  </dates> 
 

I preferred to represent the workpad as a string containing 5 8-digit numers, each preceded
by a delimiter. Due to lack of fantasy I chose the space as delimiter. Why use a delimiter?

Just try to debug using these five 8-digit numbers pressed together and you'll know
the answer.

So, to create a new workpad from the current one will require copying everything
(old first number excluded!) from the second number to the merge position (a good possibility
to use the substring() function), copying the new number, then copying everything from
the old workpad that follows the merge position.

Special care was taken to efficiency -- a most important example is how the old workpad is
reused if the current number doesn't make it there. This will save a tremendous amount of
memory in all but the trivial cases.

Also, the modifications to the Binary Search are quite small -- just note, that the merge
position will be calculated differently, depending on whether the last comparison was
> or <.

When we apply the stylesheet below to the source xml document, we'll get the following
correct result:

Result:  19990417 20001123 20010406 20020417 20100714

And here's the complete stylesheet:


  <xsl:stylesheet version=1.0>
      <xsl:output method=text/>
      <xsl:variable name=theBuffer2>
          <xsl:for-each select=/dates/datesUnSorted/date[position() < 6]>
              <xsl:sort select=@value/>
              <xsl:value-of select=concat(@value, ' ')/>
          </xsl:for-each>
      </xsl:variable>
      <xsl:template match=/>
          <xsl:variable name=theResult>
              <xsl:call-template name=mergeSingleNode>
                  <xsl:with-param name=theSiblings select=/dates/datesUnSorted/date/>
                  <xsl:with-param name=siblPosition select=6/>
                  <xsl:with-param name=sortPad select=$theBuffer2/>
              </xsl:call-template>
          </xsl:variable>
          <xsl:value-of select=$theResult/>
      </xsl:template>

      <xsl:template name=mergeSingleNode>
          <xsl:param name=theSiblings select=/../>
          <xsl:param name=siblPosition select=0/>
          <xsl:param name=sortPad/>
          <xsl:choose>
              <xsl:when test=$siblPosition > count($theSiblings)>
                  <xsl:value-of select=concat('Result: ', $sortPad)/>
              </xsl:when>
              <xsl:otherwise>
                  <xsl:variable name=Last select=string-length($sortPad) div 9/>
                  <xsl:variable name=argNumber select=$theSiblings[$siblPosition]/@value/>
                  <xsl:variable name=mergePosition>
                      <xsl:call-template name=binSearch>
                          <xsl:with-param name=argNumber select=$argNumber/>
                          <xsl:with-param name=sortedEls select=$sortPad/>
                          <xsl:with-param name=First select=1/>
                          <xsl:with-param name=Last select=$Last/>
                      </xsl:call-template>
                  </xsl:variable>
                  <xsl:choose>
                      <xsl:when test=$mergePosition > 0>
                          <xsl:variable name=newSortPad>
                              <xsl:if test=$mergePosition > 1>
                                  <xsl:value-of select=substring($sortPad, 10, ($mergePosition - 1) * 9)/>
                              </xsl:if>
                              <xsl:value-of select=concat($argNumber, ' ')/>
                              <xsl:if test=$mergePosition != $Last>
                                  <xsl:value-of select=substring($sortPad, $mergePosition * 9 + 1, 9 * ($Last - $mergePosition)) />
                              </xsl:if>
                          </xsl:variable>
<!-- Now the recursive call -->
                          <xsl:call-template name=mergeSingleNode>
                              <xsl:with-param name=theSiblings select=$theSiblings/>
                              <xsl:with-param name=siblPosition select=$siblPosition + 1/>
                              <xsl:with-param name=sortPad select=$newSortPad/>
                          </xsl:call-template>
                      </xsl:when>
                      <xsl:otherwise>
<!-- A recursive call with the same $sortPad-->
                          <xsl:call-template name=mergeSingleNode>
                              <xsl:with-param name=theSiblings select=$theSiblings/>
                              <xsl:with-param name=siblPosition select=$siblPosition + 1/>
                              <xsl:with-param name=sortPad select=$sortPad/>
                          </xsl:call-template>
                      </xsl:otherwise>
                  </xsl:choose>
              </xsl:otherwise>
          </xsl:choose>
      </xsl:template>

      <xsl:template name=binSearch>
          <xsl:param name=argNumber select=-Infinity/>
          <xsl:param name=sortedEls/>
          <xsl:param name=First select=Infinity/>
          <xsl:param name=Last select=0/>
          <xsl:choose>
              <xsl:when test=$First > $Last>
                  <xsl:value-of select=$Last/>
              </xsl:when>
              <xsl:otherwise>
                  <xsl:variable name=Mid select=floor(($First + $Last) div 2)/>
                  <xsl:variable name=midElement select=substring($sortedEls, 9 * ($Mid - 1) + 1, 8)/>
                  <xsl:choose>
                      <xsl:when test=$argNumber = $midElement>
                          <xsl:value-of select=$Mid/>
                      </xsl:when>
                      <xsl:when test=$argNumber < $midElement>
                          <xsl:choose>
                              <xsl:when test=$First < $Last>
                                  <xsl:call-template name=binSearch>
                                      <xsl:with-param name=argNumber select=$argNumber/>
                                      <xsl:with-param name=sortedEls select=$sortedEls/>
                                      <xsl:with-param name=First select=$First/>
                                      <xsl:with-param name=Last select=$Mid - 1/>
                                  </xsl:call-template>
                              </xsl:when>
                              <xsl:otherwise>
<!-- First = Last and the comparison is lt -->
                                  <xsl:value-of select=$First - 1/>
                              </xsl:otherwise>
                          </xsl:choose>
                      </xsl:when>
                      <xsl:when test=$argNumber > $midElement>
                          <xsl:choose>
                              <xsl:when test=$First < $Last>
                                  <xsl:call-template name=binSearch>
                                      <xsl:with-param name=argNumber select=$argNumber/>
                                      <xsl:with-param name=sortedEls select=$sortedEls/>
                                      <xsl:with-param name=First select=$Mid + 1/>
                                      <xsl:with-param name=Last select=$Last/>
                                  </xsl:call-template>
                              </xsl:when>
                              <xsl:otherwise>
<!-- First = Last and the comparison is gt -->
                                  <xsl:value-of select=$Last/>
                              </xsl:otherwise>
                          </xsl:choose>
                      </xsl:when>
                      <xsl:otherwise>
                          ERRROR
                      </xsl:otherwise>
                  </xsl:choose>
              </xsl:otherwise>
          </xsl:choose>
      </xsl:template>

  </xsl:stylesheet>

Additional information

Further additional information


Rate this article on a scale of 1 to 10 (0 votes, average 0)

Your vote :  

<< XQuery.NET and XML >>





Leave a comment for this article
Your name
Your email (optional)
Your comment
Optional: Upload an attachment
Enter the code shown:

 
 

    Email TopXML