BizTalk Utilities CV ,   Jobs ,   Code library
 
Home Page


Add/Edit your code items
Search the code library
Browse for the code library


XSLT
trig functions in xslt
XSLT string Replace function
A Slide-In Menu made with XML/XSLT
What is SVG?
Family Tree maker
Alternative Ordering - ""Women and Children to the lifeboats first""
Restructuring flat delimited XML
Hex and Binary Conversions in XSLT - with Boolean math functions
Using XSLT to produce SVG (1)
Using XSLT to produce SVG (2)
Performance Tuning XSLT - Part 1
Performance Tuning XSLT - Part 2
Performance Tuning XSLT - Part 3
Performance Tuning XSLT - Part 4
Using XSLT to produce SVG (3)
XPath as parameters
Table with more than one column
Create an N-Column Table from XML node-set
Finding ""the corresponding node"" in a parallel subtree
Get the last item from a delimited list


 
 

<< XQuery.NET and XML >>


By Slawomir Tyszko
First Posted 01/16/2002
Times viewed 218

Re: DVC Algorithms in XSLT


This post contains attachments
v20020116202053.zip 

Summary I fully agree with Dimitre that, for the example investigated in my last snippet, his DVC solution is superior to my two-template solution in every respect. It is much faster, simpler and, most of all, it yields much higher reduction of the recursion depth. But this does not mean that my solution is useless. If you examine a much more "memory hungry" example, you may get quite a different picture.

This snippet is a response to Dimitre Novatchev's snippet DVC Algorithms in XSLT (WasRe: Another way of avoiding XSLT processor crash) posted 7 Jan 2002, in which he critically examines my snippet Another way of avoiding XSLT processor crash posted 7 Jan 2002, and points out that the classic DVC algorithm is, in this case, much more effective. Yes, I fully agree with him. For the example I presented in my snippet, his DVC algorithm is better in every respect. But does this mean that my solution is of no use? No, it just means that I have chosen a wrong example.
In my example two cooperating recursive templates are used to carry out a very large number of arithmetical calculations. Although, as in all recursive algorithms, the processor must create a large number of temporary objects in memory, in this case those objects are small in size, as each of them stores only one number, so the memory consumption is relatively low. Dimitre has proved that in such a situation the DVC algorithm is much more effective.
But then the natural question arises: how will the two algorithms compare when executing a memory hungry task? To investigate this I choose the search&replace operations on very long strings. I was lucky to have Dimitre to help me. First of all, I was supplied with his old DVC XSLT stylesheet for making search&replace (I included it in the attached zip as lrReplace.xsl file), so I had something to compare my solution to. Secondly, he, along with me, did a lot of comparisons of the two algorithms.

Before I present the results of those comparisons, a very brief explanation of my solution. It is brief, as it is very similar to the one decribed in Another way of avoiding XSLT processor crash. There are two cooperating recursive templates. The first one (replace-text) is the slightly modified template taken from the Trace Wilson snippet How to do a search and replace in XSLT posted 23 Feb 2001. The modification consists in adding one new parameter how-many, specifying for how many replacements the template is called. If, for instance, this parameter is 10, then the template returns the input string in which only the first 10 occurences of the replace substring are replaced, leaving the rest of the input string unchanged.
The second template (global-replace) calls repeatedly the replace-text template for performing successive chunks of replacements. It has the chunk-size parameter, that the user can set to determine how many replacements the first template will perform in one go, as the global-replace template simply passes this parameter to the first template as the how-many parameter. This number directly determines the reduction of the recursion depth. If, for instance, the input string contains 10000 substrings to-be-replaced, the conventional one-template solution would need recursion depth of 10000, while here, if you set chunk-size to 100, you will have to call the first template 100 times, each time for 100 replacements, so you reduce the recursion depth to 200 (100+100).

To compare the speed of the two algorithms I prepared a test XML file containing one very large text node containing 10000 substrings 'AAA' interspersed with other text (156 KB). In my stylesheet I call the global-replace template with parameter by specifying that substrings 'AAA' should be replaced by 'ZZZ', and chunk-size set to 100. On my PC (500 MHZ, 256 MB RAM) IE 5.0 with MSXML 3.0 installed needed about 5 sec to open this test file linked to the stylesheet implementing my chunk algorithm. In case of Dimitre's DVC algorithm the result of the same test was 105 sec. Then I increased the test file tenfold to 100000 substrings to-be-replaced (1.53 MB), and also changed the chunk-size to 1000. The result was: 115 sec for my algorithm, and 1100 sec for DVC. Note that with DVC algorithm the time increases almost linearly with the input string length, while with my algorithm it grows faster, so at some point DVC should overtake mine, but the initial gap is so wide, that in all practical cases the chunk solution should be markedly faster.
The chunk algorithm is also considerably simpler, as the DVC algorithm has to deal with so-called splitting-replacement-target problem (splitting a substring to-be-replaced when dividing the input string), which makes the code much more complicated.
The last advantage of the chunk solution is that it can be executed also by an XSLT processor without the node-set extension function implemented, while Dimitre's DVC implementation makes use of it.
On the other hand, the DVC is unrivalled as regards reduction of the recursion depth. For instance, in case of that 1.5 MB string, recursion depth, as reduced by DVC, does not exceed 20, while for the chunk solution with chunk-size set to 1000, this number is 1100 (100 chunks of 1000 replacements each).

Conclusion: When you have to reduce the recursion depth, the DVC algorithm is not always the best. Especially, when you have a memory hungry task, consider also the chunk solution.

Acknowledgement: I am very grateful to Dimitre Novatchev. Without his assistance, especially supplying me with his DVC solution, and helping me to understand the real advantages (as well as limitations) of DVC algorithms, I would not have been able to write this snippet.

Here I attach a reduced version with the input file containing 100 strings 'AAA' to be replaced with 'ZZZ' and chunk-size set to 10.

Important: If you would like to check the results presented here, do not forget to set the chunk-size parameter accordingly.

Additional information


Rate this article on a scale of 1 to 10 (0 votes, average 0)

Your vote :  

<< XQuery.NET and XML >>





Leave a comment for this article
Your name
Your email (optional)
Your comment
Optional: Upload an attachment
Enter the code shown:

 
 

    Email TopXML