Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
PLEASE NOTE: For more information on MSXML and Parsers in
general, go here.
Every couple of months this year Microsoft has released another
in a series of technology previews and betas of the latest version
of its XML parser, MSXML. Each new version gets bigger, better,
supports more of the profusion of XML standards, better supports
development tools like Visual Basic, and adds a whole new layer of
complexity while opening up vast new possibilities.
One thing the world doesn't seem to need another of is Yet
Another XML Parser. But Microsoft is working hard to incorporate
XML into virtually every product it ships, and key to the success
of the effort is a robust XML parser. An XML parser lets you merge
data, objects, and programs together by applying context to
content.
XML is a series of hierarchically arranged labels that, viewed
through a parser, contain data and provides context to that data,
functioning as both code and data. What's really happening is that
by letting you write XML files that act as programs, XML parsers
are changing XML from a simple hierarchical text file to a
stand-alone programming environment.
You have two options for installing MSXML 3: side-by-side for
use with earlier versions, and replacement to blow old versions
away and clean up references to them in the registry. It also
includes a tool, XMLinst.exe, that lets you switch back and forth
almost at will.
In a move that is surprisingly rare for the company who
instigated DLL Hell, MSXML 3.0 is implement both in a DLL with a
new name, MSXML3.dll versus the older MSXML.dll, and a new library
name part of the ProgID: MSXML2 versus the older MSXML. No word yet
whether the discrepancy between the 2 and 3 in the names will be
cleaned up before release.
Side-by-Side
MSXML 3.0 is designed to coexist with your current version of
MSXML. After installation, in the registry all the old
MSXML-related entries still point to MSXML.dll, not the newer file,
MSXML3.dll. New entries have been added for the new features. To
use the new features in this side-by-side installation, you will
have to explicitly disambiguate references to the objects that you
want, lest your applications become confused:
var xmlDoc = new ActiveXObject("MSXML2.DOMDocument");
Retrievably Replaced
Since the new component is backward compatible, with new names
in all the right places, it's a bit more difficult to install the
new version as a replacement. All the detritus will remain in the
registry unless you painstakingly remove it; regsvr32 can help.
But you may also want to fool around with the new version before
committing it. A utility, XMLinst.exe, is installed with MSXML 3.0
in your SYSTEM32 directory. If you run the utility without any
command line parameters, all your old MSXML registry entries are
replaced by references to MSXML 2.6. If you want to remove all
entries created by a particular DLL, use this command line:
xmlinst -u dllname
where dllname is either msxml or msxml3.
If you want to go back to the previous version, this line does
the trick:
xmlinst msxml.dll
The dll named on that command line (which can take a
fully-qualified path) will replace all entries created by MSXML
with references to the named DLL.
In addition to the old COM interfaces, some of which are kept
around only for the ever-elusive goal of backward compatibility,
MSXML supports five new interfaces and corresponding COM
objects:
XMLDOMDocument2 Object: Extension of DOMDocument that supports
schema caching, validation and XPath features
XMLSchemaCache Object: Used by the schemas and namespaces
properties on XMLDOMDocument2; free-threaded, use with multiple
documents
XMLDOMSelection Object: List of nodes that match a given XSL
Pattern or XPath expression
XSLProcessor Object: Used for transformations with compiled
style sheets
XSLTemplate Object: Used for caching compiled XSLT
templates
MSXML also included new objects that support the Simple API for
XML, SAX, which I'll talk about more later.
XSL Transformations are probably the most significant change in
this version of MSXML, and, indeed, in all of the XML universe.
MSXML 3.0 supports the W3C XSL Transformations (XSLT) standard.
XSLT is a superset of XSL with expanded node selection and
transformation operations. Most significantly, it includes strong
data typing and basic data-manipulation operations.
String Operations
XSLT String operations include casting other data types, such as
NaN (not a number) as well as numbers to strings, concatenation,
searches, comparisons, and partial string handling. You may now
select the portion of a string that follows a specified target
substring using the substring-after() function. It works much the
same way as in any other programming language-you provide an input
string and a search string.
<xsl:valueOf select=".substring-after("Ms. Susan Kessler",
"Ms. ")
returns "Susan Kessler".
XML parsers ignore white space by default, so you may write the
code exactly as shown, or all on a single line.
Number and Boolean Operations
XSLT can perform simple numeric operations such as
string-to-number conversion, and includes sum, floor, ceiling, and
round functions. For example, you can calculate the total of a set
of nodes holding customer order information.
Improved White Space Handling
XSLT delivers better white space handling. The DOMDocument
object now includes a Boolean preserveWhiteSpace property that
controls the default white space handling for the entire document.
The default value is False, which means the processor ignores extra
white space. When the property is True, the parser preserves white
space.
You often want to preserve white space only for specific tags.
For those tags you can use one of these new attribute
values:
xsl:space="preserve"
or
xsl:space="default"
In combination, the preserveWhiteSpace property and the
xsl:space attribute provide one of four white space options:
Preserved: preserves all white space
Trimmed: preserves all but leading and trailing spaces
Half preserved: preserves white space within text, but removes
white space between tags
Combinations such as preserved and trimmed, or half preserved
and trimmed
Additionally, the XSLT function normalize-space normalizes a
string or node value, removing extra spaces and replaces tabs and
newlines characters with spaces.
Improved Entities
XSL also has been better handling of entities such as the left
angle bracket (<), which is reserved in both HTML and XML. But
XML recognizes only 5 entities. For example, most HTML writers use
the entity to insert a hard space. In XSLT, use the
syntax   instead.
Data Type Support
Until now, XML could contain only string data, so strong data
typing is a major leap forward. You could convert data from a
string representation to a specific data type in code, but there
was no standard way to represent different data types within the
XML file itself.
In MSXML 3.0, you can specify a data type for an element or
attribute using a reference to the Microsoft data type schema:
xmlns:dt="urn:schemas-microsoft-com:datatypes"
The schema defines a gajillion data types, including boolean,
every kind of integer, real and float, date, time, datetime, char
and number (any numeric type). You can specify a data type within
the schema by using the attribute
The XPath Recommendation expands the XSL pattern capabilities.
In XPath, a location path consists of an axis establishing a
tree relationship between the context node and the nodes selected
by the query, a node test denoting the type of nodes to
select, and a predicate further refining the selection. In
addition to the old pattern characters, XPath adds the double dot
operator to select the parent of the context node. The full axis
syntax also includes the ability to traverse up the tree.
These are two equivalent XPath queries selecting the parent of
the current node:
Parser.selectSingleNode("..");
Parser.selectSingleNode("parent");
You can also select the entire subtree, composed of the current
node's parent, its parent, and so forth back to the document
element:
Parser.selectNodes("ancestor");
XPath offers a rich set of functions for predicates, although
the syntax has changed. XSL pattern operators were delimited by
dollar signs, but XPath operators are more like those in
conventional programming languages. What was done in XSL:
//Turnip[price $le$ 0.25]
in XPath becomes:
//Turnip[price <= 0.25]
The new interfaces are supersets of the old, so it is easy to
modify your existing utilities and application to use MSXML 3.0. If
you were using XSL patterns and wish to migrate to XPath, you must
include the following line in your code:
parser.setProperty("SelectionLanguage", "XPath");
MSXML 3.0. defaults to XSL pattern syntax for backward
compatibility. You can check the status of the parser at any time
with the getProperty method, and reset the parser with a call to
setProperty like this:
Because high-powered performance is more important on the server
than the client (you typically don't process large files on the
client), Microsoft has focused their performance enhancements on
the server. Primarily through caching features, MSXML vastly
improves the performance of XML applications. Your mileage may
vary, of course, since performance enhancements depend heavily on
specific applications and their environment, but it is likely
you'll see dramatic improvements without any work on your part. But
you can boost it even further by taking advantage of the parser's
new features.
Cached Schemas and Templates
One of the most common uses for XML is to pass data between
components and systems. A validated XML document references one or
more schemas. To validate the document, the XML parser must load
and parse both the schema and the XML data in the document. But
since schemas rarely change, you could improve throughput by
caching the schema in memory.
The XMLSchemaCache object provides a way to cache a collection
of schemas. To use cached schemas, you load an XML document that
references one or more schemas, create an XMLSchemaCache object and
tell it to add the schemas using the XMLSchemaCache.addCollection
method.
For example, using VBScript, you can cache the schemas like
this:
Dim colSchemas
Set colSchemas =
Server.CreateObject("MSXML3.XMLSchemaCache")
colSchemas.addCollection(XMLDoc.namespaces)
To apply the schemas, assign the XMLSchemaCache object to a new
DOMDocument object property called schemas. For example:
Dim docXML
Set docXML = Server.CreateObject("MSXML3.DOMDocument")
Set docXML.schemas = colSchemas
Compiled Stylesheets
In older versions of MSXML, you preformed an XSL transformation
with the old parser by calling the transformNode method of the XML
document, passing it an instance of the parser containing your
stylesheet document. Every time you called transformNode, the
stylesheet had to be compiled.
Two of the new interfaces introduced with MSXML 3.0 address this
issue. IXMLDOMXSLTemplate is a free-threaded component that
compiles stylesheets. IDOMXSLProcessor takes care of
transformations with compiled stylesheets.
To use these features, you create a template and save it into
the Application object. Whenever you need to perform a
transformation, you call the template's createProcessor method to
obtain a snapshot of the template in the form of an XSL processor
component. As long as your application is running, you don't have
to load the stylesheet from disk and compile it again.
var xmlDoc = new
ActiveXObject("MSXML2.FreeThreadedDOMDocument");
var xslSheet = new
ActiveXObject("MSXML2.FreeThreadedDOMDocument");
var xslTemplate = new ActiveXObject("MSXML2.XSLTemplate");
Always use the free threaded version of the parser. If you
attempt to assign an apartment-threaded component
(MSXML2.XMLDOMDocument) to a template, the call will fail. At this
point, there are two XML parsers. Then load the document and
stylesheet, assigning the component holding the stylesheet to the
stylesheet property of the template component:
xmlDoc.load(Server.MapPath("manifesto.xml"));
xslSheet.load(Server.MapPath("papersParam.xsl"));
xslTemplate.stylesheet = xslSheet;
Finally, create an XSL processor, attach the data document to it
as the input to be transformed, and perform the transformation. Any
subsequent changes to xslSheet or xslTemplate will not be reflected
in the processor:
"Traditional" XML parsers parse an entire XML file before you
can do anything with the contents. If you're working with large
files, this can be costly both in resources and time, particularly
when all you want is a single piece of data buried deep within the
file. Often, you just need to parse a portion of a file or process
a file from start to end, extracting information from specific
nodes.
The Simple API for XML, SAX, currently in version 2, takes a
whole different approach. Those who like open source software will
love SAX. It was developed by participants on the XML-DEV mailing
list and placed in the public domain by its primary author.
SAX reads an XML file and raises events as it processes each
node. Unlike the XML parser that uses the DOM, SAX does not create
the individual node objects nor construct the DOM hierarchy tree in
memory. It reads elements, raises events, and discards that element
and moves on to the next.
You can monitor the events to extract information from the file.
For example, suppose you had an XML file that contained demographic
information for 100,000 people, and you need to find one particular
piece of information. Parsing the entire file into in-memory
objects is overkill and would take forever. With SAX, you would
read through the file, monitor the events, and stop processing when
you found the required information.
Microsoft has placed significant resources behind XML, and
anytime that's the case they make sure developers have the tools we
need to get our work done. Here are a few that are particularly
useful, or will be when MSXML 3.0 hits the streets. You can find
links to them at http://msdn.microsoft.com/xml/.
XSL to XSLT Converter 1.0
The xsl-xslt-converter.xslt style sheet updates Microsoft
Internet Explorer 5 XSL style sheets to XSLT-compliant style
sheets. Since stylesheets are XML, you can use XSLT to convert your
old stylesheets. The transformation support is incomplete, so your
old stylesheets will need some tweaking.
XSL ISAPI Filter 2.0
The XSL ISAPI Filter enables server-side XSL formatting for
multiple device-types. It features automatic execution of XSL style
sheets on the server, choosing alternate style sheets based on
browser type, style-sheet caching for improved server performance,
the capability to specify output encodings, and customizable error
messages.
XML Validation Tool
This tool is an updated version of the XMLINT command line tool
that shipped in the Internet Explorer 4 SDK. It checks that an XML
file is well formed. It also uses the new XML DOM to check that the
document is valid according to the DTD or XML-Data Schema.
Internet Explorer Tools for Validating XML and Viewing XSLT
Output
Currently, when browsing XML files using Internet Explorer, the
XML documents are not validated, and when viewing the source of the
document, only the XML is returned and there is no way to view the
output from the XSL or XSLT style sheet that may have been used to
transform that XML document.
The Internet Explorer tools for validating XML and viewing XSLT
output enable a shell option when viewing XML files to see the
processed XSL output. You can also validate XML against an embedded
schema when loading XML via the Internet Explorer MIME viewer. This
can be a useful tool when you are trying to debug XSL formatting
problems in Internet Explorer or are doing quick schema
validation.
Don Kiely is Software Technologist for Third Sector Technologies
in Fairbanks, Alaska, which develops custom solutions for
government agencies and non-profit organizations. He's written and
co-written several books about VB and VC++, including "Visual Basic
Programmer's Guide to the Windows Registry". He also teaches VB,
XML, and SQL Server courses for AppDev and is a regular speaker at
software conferences. Reach him at donkiely@computer.org.