Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :
03/24/2008
Times viewed :
356
.NET SQLXML Performance
We yet have to discuss what performance
impact SQLXML will have on our applications. We get a lot of powerful,
easy-to-use functionality out of XPath queries, updategrams, templates (and the
xsl transformations that we have not discussed) but there is quite some
processing going behind the scenes. Take an XPath query for example. First,
SQLXML validates the query against the schema, which of course requires loading
and parsing the schema. Then it converts the XPath query into a FOR XML
EXPLICIT query which combines several SELECT queries to a UNION. Finally it
transforms the resulting rowset to XML. All this processing has to create more
load on the application than just reading a rowset and converting it to XML –
if it wasn’t for a few performance enhancing features in XML. To reduce the
performance impact of the parsing the mapping schemas SQLXML does by default
cache the parsing results, i.e. the major performance hit is out of the way
once every schema in the system in loaded. Also to reduce total load on the
database , SQLXML does allow to move the load of the XML conversion away from
SQL Server. This section does examine both of these features and then
show some benchmark results for the different transformations from the previous
section.
The first performance-enhancing feature in SQLXML is geared more towards
maintaining scalability rather than increasing overall performance. Instead of
having SQL Server converting query results, SQLXML lets us chose to do the
conversion on the database client, hence the name client-side formatting.
SQLXML allows client-side XML formatting, which we can activate through a
property on the SqlXmlCommand class. When the property is set to true SQLXML
will strip the FOR XML clause from the query and send the plain SQL statement
to retrieve a rowset. The SQLXML engine running on application server instead
of the database server generates the XML. This feature is very important if
your application is set up in the n-tier model, where multiple application
servers access data from a shared database server. If all application servers
query results in XML format, through either FOR XML or XPath queries, SQL
Server has to convert the results for all application servers accessing the
database. This lowers your application’s overall scalability, especially if it
is architected to scale horizontally, by adding more applications servers to
increase capacity. Each application server not only increases the number of
queries against the single database, it also adds an extra the XML processing
liability. If you make use of SQLXML client-side formatting an XPath query only
results in a UNION of one or more simple select statements, i.e. barely any
more load than we would have without SQLXML.
Another application of client-side formatting that comes in very handy is
XML-ifying existing stored procedures. Imagine you have a few hundred existing
stored procedures that you would like to re-use in an XML-driven application.
You could change each stored procedure to return the XML from the server, but
then you would have to also change all the places that call the stored
procedure. That’s not a good solution, next you contemplate to make copies of
all the stored procedures and change these to return XML. That’s not a good
solution either because then you have to remember to modify two procedures when
changes are needed. The best solution might be to not change the stored
procedures at all and let SQLXML do the XML formatting as shown in the example
below:
SqlXmlCommand cmd = new SqlXmlCommand(connString);
cmd.ClientSideXml = true;
cmd.CommandText = "Exec GetAllOrders FOR XML NESTED,
ELEMENTS";
return cmd.ExecuteXmlReader();
The NESTED option on the FOR XML clause is similar(!) to the AUTO option
used on the server, but we must only use it in conjunction with client-side
processing. Specifically, NESTED is the only mode that we can apply to
transform the output of a stored procedure. There are also a few little
differences in the XML format AUTO mode and NESTED mode generate when it comes
to querying views and when you define table aliases. AUTO names the returned
elements after the view, while NESTED names them after the view’s base tables.
NESTED does not recognize table aliases, while AUTO uses the alias for element
names. You need to keep in mind that SQLXML will ignore the ClientSideXml
property if you execute an AUTO mode query and submit the FOR XML AUTO
clause to SQL Server anyway. SQLXML can only apply client-side formatting to
results from queries in RAW, NESTED and EXPLICIT mode.
Another difference between client-side formatting and server-side formatting
is that you can execute statement prohibited by server generated XML. The FOR
XML clause is not allowed in several situations like in a nested SELECT
statement, in conjunction with a COMPUTE BY clause, or with GROUP BY and
aggregate functions. But since SQLXML only sends the “regular” SQL part of the
query to the server, you can avoid these limitations of server generated XML.
The second performance-enhancing feature we examine is schema caching. The
first time we execute an XPath or an Updategram against an annotated schema,
SQLXML will parse the schema and store the parsed schema in an internal cache.
The next time we reference the schema SQLXML can skip loading and parsing the
schema, which makes the execution of the query or the updategram much faster.
The size of the cache is stored in the registry under the following registry
key:
You will find several values to control the sizes of different SQLXML
caches. The default size of the schema cache is 31 schema files. You can
increase the size if your application references more than 31 schemas.
Otherwise SQLXML will purge a schema from the cache when it is full. Next time
you reference the purged schema SQLXML will parse it again and purge another
one. You get the idea right?
Make sure your schema cache can hold all your schemas if you want maximum
performance.
Mapping schemas are not the only item SQLXML caches to improve overall
performance. Templates and XSLT stylesheets are
also kept in dedicated caches. The cache sizes are also stored under registry
key above.
Before we move on, I thought I would show some more concrete numbers to help
you weighing the trade offs between flexibility and maintenance cost on one
side against performance on the other.
I ran two series of tests, one for inserting data from an XML document into
the database and one for querying results in an XML. I used SQL Server’s
Northwind database, because you should all have it when you installed SQL
Server. I modified the database only for the test where I needed to test the
timing for client-side identity generation. I did not modify the database for
any other test run. I ran each test 2000 times and averaged the results to get
a reliable number. Then I normed the results to the fastest one of the tests,
because I think you care more about the relative performance of the tests than
how fast my particular server setup can run these benchmarks.
SQLXML Bencharks
Now for the details. First let’s look at the query benchmark. I queried a
hierarchy of nested elements, not just a simple element to table 1:1 scenario.
I chose a hierarchy of Order and OrderDetail elements with the data from the
tables Orders and Order Details. The five contestants in this benchmark were
A SQL FOR XML AUTO query executed with the SqlCommand
A SQL FOR XML AUTO query executed with the SqlXmlCommand
an XPath query with SQLXML and an annotated schema,
querying data into a DataSet with the SqlCommand and the SqlDataAdapter, setting up the DataRelations and writing the data out in the
DataSet’s default format
querying the data into a DataSet with the SqlCommand and the
SqlDataAdapter and then manually looping over the tables and create the XML
with an XmlTextWriter.
I did not attempt the query the data into the faster DataReader because the
code to create nested XML from several DataReaders can get quite convoluted.
And the winner, ladies and gentlemen, the winner is … actually we have two
winners. XPath and manually looping over the DataSet were equally fast as we
can see in table 15.12. If we also consider that querying with XPath leads to
much more maintainable code, I favor XPath queries over converting by-hand.
1.12 Relative performance of
different techniques to query data from SQL Server in XML format. Performance
is listed relative to the fastest technique.
Query Technique
Relative Performance
FOR XML AUTO (SQLXML)
100%
FOR XML AUTO (SQLClient)
102%
Xpath
150%
DataSet (WriteXml)
157%
DataSet (Explicit)
145%
The second test series examines the insert techniques from section 15.3.2.
This time each test case had to insert an Order with two nested Order Detail
elements. Specifically the lineup for these tests was:
Reading the XML document into an XmlDocument, then composing a SQL
Statement
Reading the XML document into a DataSet, then composing a SQL
statement
Writing the XML to an Updategram, then transforming the updategram
with XSLT to add the attributes required to handle the identity
Reading the Xml with the XmlSerializer, generate a key value
client-side, set the key value and serialize the data into an updategram.
Parsing the XML document on the database server with an OPENXML
statement.
The clear winner was the XmlDocument/SQL combination, followed by
Updategram/XSLT the DataSet and team Updategram/XmlSerializer finishing last.
Note that inserts with updategrams are even faster if we don’t have to modify
the XML document we insert. Without the extra parsing step involved Updategrams
would lead the field of this benchmark.
1.13 Relative performance of
different techniques to insert data from an XML document into a SQL Server
database. Performance is listed relative to the fastest technique.
Insert Technique
Relative Performance
SqlDocumentByName
100%
DataSet
108%
UpdateGram w/ XSLT
107%
ClientSide w/ Serialization
154%
OPENXML
156%
So much for our little performance benchmark. Remember: actual mileage will
vary with the complexity and the size of the XML documents you are processing.
Do not take these results cast in stone! I strongly recommend that you download
the benchmark code from the web site, adapt the code to your own schemas and
see what results you are getting.