In my last, Binary Search snippet I mentioned that the algorithm has many
and very important applications, and hinted I was going to use it in sorting.
This is precisely what I'm going to do now.
Here's how our web-administrator Trace Wilson formulated the problem:
Hi Daryl,
I've had the same problem for quite a while.
What I do is first is find all items before a date,
then I sort on my date (see snippet
http://www.vbxml.com/snippetcentral/main.asp?view=viewsnippet&id=v20010125215939
)
There are another 2 there on dates.
From there I need to check if the position() is less
than x, which means that my whole selection gets
iterated through. If you can think of a nicer
solution, I'd love to hear your ideas. I asked
earlier on this and Dimitre mentioned that I really
had no choice in the matter. But he has just supplied
a solution for breaking out of a for-each which I'm
going to try next week..
What I was offering to Trace then was to stop performing full sort (with complexity
of Log2(N)*N and to find k (k=5 in the real case -- anyway, something much
smaller than N) times the maximum dates instead. The latter has a complexity of
O(N) -- this means that the complexity is some linear function of N -- c*N.
In this case it is obvious that c=k.
While this solution is clearly much better, there's still the possibility that another O(N)
solution might exist with complexity c2*N, where c2 is signifficantly smaller than c.
Interesting, isn't it? :o))
OK, let's analyse the problem.
When finding the maximum element of a node-set we have a single work pad where
to keep the current maximum value. We scan through the set one element at a time
and compare it to the value in the workpad. Whenever the current element is greater,
we produce a new workpad and put the new number there (for reasons of simplicity
I'll speak about numbers, not elements, but we'll know that this is the number,
corresponding to the element).
The algorithm ends when we have performed a full scan of the set of elements.
If the set has N elements, there will be N comparisons.
To perform this algorithm k times will require k*N comparisons.
How can we find the k biggest elements even faster?
Hmmm... if only we could scan the set only once... Wait, that's an interesting idea,
why not?
Imagine that we have a bigger workpad that can hold k numbers. At any step through
the elements of the set this workpad will hold the k biggest numbers up to now.
If the current element is bigger than at least one in the workpad, then we construct
a new workpad, in which the smallest (up to now) number has been dropped and
the current number has been added.
In this way we will perform only one pass over the set of elements and at the end
the workpad will contain the k biggest elements.
Wow... isnt that nice!
Yes, but... does this really have smaller complexity? We scan only N elements
but every element's number has to be compared now with the k numbers in the
workpad -- this still means k*N comparisons... :(((
OK, now we're ready for the really big step forward -- what if we keep the numbers
in the workpad always sorted? Will we then really require k comparisons?
Wait... this reminds me of something... didn't I write another snippet just the day before
and said it could be used for very efficient sorting? Binary Search...?
Here's the idea: We modify slightly the Binary Search snippet so that it will always
return the position a number must have in the sorted workpad if we want to add it there
and still keep the workpad sorted.
What will be the complexity then?
A binary search requires just Log2(k) comparisons -- hence the total complexity will be
Log2(k)*N.
So, compared to the k-times the maximum this algorithm will be k div Log2(k) times
faster.
In practical terms, if we're getting 5 out of N, this will be slightly more than twice faster,
if we're getting 10 out of N this will be more than 3 times faster, if we're getting 100 of N
(think N is thousands) this will be almost 7 times faster.
Notice also, that if we make the workpad hold N elements, then we have (our own!)
super fast sorting algorithm. The complexity is the same as that of xsl:sort,
but we now have huge flexibility to define and use in the sort our own comparison
operators for any (including our own!) data types. Good result, isnt it?
Is this the fastest sort possible?
Well, I know of at least one search algorithm, which is faster than binary search.
May give it a try... :o))
Now, how we code this in XSLT?
Let's have the following source xml document:
<dates>
<datesSorted>
<date value=19271003/>
<date value=19321004/>
<date value=19530323/>
<date value=19640505/>
<date value=19920714/>
<date value=19990417/>
<date value=20001123/>
<date value=20010406/>
<date value=20020417/>
<date value=20100714/>
</datesSorted>
<datesUnSorted>
<date value=20100714/>
<date value=19321004/>
<date value=19920714/>
<date value=19530323/>
<date value=19990417/>
<date value=19271003/>
<date value=20001123/>
<date value=19640505/>
<date value=20020417/>
<date value=20010406/>
</datesUnSorted>
</dates>
I preferred to represent