BizTalk Utilities CV ,   Jobs ,   Code library  
 
Home Page
XML DOM
Information on the MinML SAX parser
Python & XML
The XML HTTPRequest
The XML DOMParseError
The XML DOMProcessingInstruction
The XML DOMEntity
The XML DOMDocumentType
The XML DOMAttribute
The XML DOMCDATASection
The XML DOMNamedNodeMap
The XML DOMNodeList
The XML DOMNode
The XML DOMElement
The XML DOMDocument
The XML DOM objects
The DOM Property: value
The DOM Property: validateOnParse
The DOM Property: url
The DOM Property: text
The DOM Property: tagName
<< XHTML
XmlSerializer >>

By :Mark Wilson
I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :01/19/2004
Times viewed :2478

 

 

  

MSXML 3.0 and the Next Generation

By Don Kiely
Copyright 2000 Don Kiely

PLEASE NOTE:  For more information on MSXML and Parsers in general, go here.

Every couple of months this year Microsoft has released another in a series of technology previews and betas of the latest version of its XML parser, MSXML. Each new version gets bigger, better, supports more of the profusion of XML standards, better supports development tools like Visual Basic, and adds a whole new layer of complexity while opening up vast new possibilities.

One thing the world doesn't seem to need another of is Yet Another XML Parser. But Microsoft is working hard to incorporate XML into virtually every product it ships, and key to the success of the effort is a robust XML parser. An XML parser lets you merge data, objects, and programs together by applying context to content.

XML is a series of hierarchically arranged labels that, viewed through a parser, contain data and provides context to that data, functioning as both code and data. What's really happening is that by letting you write XML files that act as programs, XML parsers are changing XML from a simple hierarchical text file to a stand-alone programming environment.

Installing MSXML

You have two options for installing MSXML 3: side-by-side for use with earlier versions, and replacement to blow old versions away and clean up references to them in the registry. It also includes a tool, XMLinst.exe, that lets you switch back and forth almost at will.

In a move that is surprisingly rare for the company who instigated DLL Hell, MSXML 3.0 is implement both in a DLL with a new name, MSXML3.dll versus the older MSXML.dll, and a new library name part of the ProgID: MSXML2 versus the older MSXML. No word yet whether the discrepancy between the 2 and 3 in the names will be cleaned up before release.

Side-by-Side

MSXML 3.0 is designed to coexist with your current version of MSXML. After installation, in the registry all the old MSXML-related entries still point to MSXML.dll, not the newer file, MSXML3.dll. New entries have been added for the new features. To use the new features in this side-by-side installation, you will have to explicitly disambiguate references to the objects that you want, lest your applications become confused:

 

var xmlDoc = new ActiveXObject("MSXML2.DOMDocument");

 

Retrievably Replaced

Since the new component is backward compatible, with new names in all the right places, it's a bit more difficult to install the new version as a replacement. All the detritus will remain in the registry unless you painstakingly remove it; regsvr32 can help.

But you may also want to fool around with the new version before committing it. A utility, XMLinst.exe, is installed with MSXML 3.0 in your SYSTEM32 directory. If you run the utility without any command line parameters, all your old MSXML registry entries are replaced by references to MSXML 2.6. If you want to remove all entries created by a particular DLL, use this command line:

 

xmlinst -u dllname

where dllname is either msxml or msxml3.

 

If you want to go back to the previous version, this line does the trick:

 

xmlinst msxml.dll

 

The dll named on that command line (which can take a fully-qualified path) will replace all entries created by MSXML with references to the named DLL.

Better Objects

In addition to the old COM interfaces, some of which are kept around only for the ever-elusive goal of backward compatibility, MSXML supports five new interfaces and corresponding COM objects:

  • XMLDOMDocument2 Object: Extension of DOMDocument that supports schema caching, validation and XPath features
  • XMLSchemaCache Object: Used by the schemas and namespaces properties on XMLDOMDocument2; free-threaded, use with multiple documents
  • XMLDOMSelection Object: List of nodes that match a given XSL Pattern or XPath expression
  • XSLProcessor Object: Used for transformations with compiled style sheets
  • XSLTemplate Object: Used for caching compiled XSLT templates

MSXML also included new objects that support the Simple API for XML, SAX, which I'll talk about more later.

Better Transformation Operations

XSL Transformations are probably the most significant change in this version of MSXML, and, indeed, in all of the XML universe. MSXML 3.0 supports the W3C XSL Transformations (XSLT) standard. XSLT is a superset of XSL with expanded node selection and transformation operations. Most significantly, it includes strong data typing and basic data-manipulation operations.

String Operations

XSLT String operations include casting other data types, such as NaN (not a number) as well as numbers to strings, concatenation, searches, comparisons, and partial string handling. You may now select the portion of a string that follows a specified target substring using the substring-after() function. It works much the same way as in any other programming language-you provide an input string and a search string.

 

<xsl:valueOf select=".substring-after("Ms. Susan Kessler", "Ms. ")

 

returns "Susan Kessler".

XML parsers ignore white space by default, so you may write the code exactly as shown, or all on a single line.

Number and Boolean Operations

XSLT can perform simple numeric operations such as string-to-number conversion, and includes sum, floor, ceiling, and round functions. For example, you can calculate the total of a set of nodes holding customer order information.

Improved White Space Handling

XSLT delivers better white space handling. The DOMDocument object now includes a Boolean preserveWhiteSpace property that controls the default white space handling for the entire document. The default value is False, which means the processor ignores extra white space. When the property is True, the parser preserves white space.

You often want to preserve white space only for specific tags. For those tags you can use one of these new attribute values: 

 

xsl:space="preserve"

 

or

 

xsl:space="default"

 

In combination, the preserveWhiteSpace property and the xsl:space attribute provide one of four white space options:

  • Preserved: preserves all white space
  • Trimmed: preserves all but leading and trailing spaces
  • Half preserved: preserves white space within text, but removes white space between tags
  • Combinations such as preserved and trimmed, or half preserved and trimmed

Additionally, the XSLT function normalize-space normalizes a string or node value, removing extra spaces and replaces tabs and newlines characters with spaces.

Improved Entities

XSL also has been better handling of entities such as the left angle bracket (<), which is reserved in both HTML and XML. But XML recognizes only 5 entities. For example, most HTML writers use the entity &nbsp; to insert a hard space. In XSLT, use the syntax &#160; instead.

Data Type Support

Until now, XML could contain only string data, so strong data typing is a major leap forward. You could convert data from a string representation to a specific data type in code, but there was no standard way to represent different data types within the XML file itself.

In MSXML 3.0, you can specify a data type for an element or attribute using a reference to the Microsoft data type schema:

 

xmlns:dt="urn:schemas-microsoft-com:datatypes"

 

The schema defines a gajillion data types, including boolean, every kind of integer, real and float, date, time, datetime, char and number (any numeric type). You can specify a data type within the schema by using the attribute

 

dt:type="datatype"

 

or within an XML document by using the attribute

 

dt:dt="datatype"

 

Location Paths with XPath

The XPath Recommendation expands the XSL pattern capabilities. In XPath, a location path consists of an axis establishing a tree relationship between the context node and the nodes selected by the query, a node test denoting the type of nodes to select, and a predicate further refining the selection. In addition to the old pattern characters, XPath adds the double dot operator to select the parent of the context node. The full axis syntax also includes the ability to traverse up the tree.

These are two equivalent XPath queries selecting the parent of the current node:

 

Parser.selectSingleNode("..");

Parser.selectSingleNode("parent");

 

You can also select the entire subtree, composed of the current node's parent, its parent, and so forth back to the document element:

 

Parser.selectNodes("ancestor");

 

XPath offers a rich set of functions for predicates, although the syntax has changed. XSL pattern operators were delimited by dollar signs, but XPath operators are more like those in conventional programming languages. What was done in XSL:

 

//Turnip[price $le$ 0.25]

 

in XPath becomes:

 

//Turnip[price <= 0.25]

 

The new interfaces are supersets of the old, so it is easy to modify your existing utilities and application to use MSXML 3.0. If you were using XSL patterns and wish to migrate to XPath, you must include the following line in your code:

 

parser.setProperty("SelectionLanguage", "XPath");

 

MSXML 3.0. defaults to XSL pattern syntax for backward compatibility. You can check the status of the parser at any time with the getProperty method, and reset the parser with a call to setProperty like this:

 

parser.setProperty("SelectionLanguage", "XSLPattern");

Boosting Performance

Because high-powered performance is more important on the server than the client (you typically don't process large files on the client), Microsoft has focused their performance enhancements on the server. Primarily through caching features, MSXML vastly improves the performance of XML applications. Your mileage may vary, of course, since performance enhancements depend heavily on specific applications and their environment, but it is likely you'll see dramatic improvements without any work on your part. But you can boost it even further by taking advantage of the parser's new features.

Cached Schemas and Templates

One of the most common uses for XML is to pass data between components and systems. A validated XML document references one or more schemas. To validate the document, the XML parser must load and parse both the schema and the XML data in the document. But since schemas rarely change, you could improve throughput by caching the schema in memory.

The XMLSchemaCache object provides a way to cache a collection of schemas. To use cached schemas, you load an XML document that references one or more schemas, create an XMLSchemaCache object and tell it to add the schemas using the XMLSchemaCache.addCollection method.

For example, using VBScript, you can cache the schemas like this:

 

Dim colSchemas

Set colSchemas = Server.CreateObject("MSXML3.XMLSchemaCache")

colSchemas.addCollection(XMLDoc.namespaces)

 

To apply the schemas, assign the XMLSchemaCache object to a new DOMDocument object property called schemas. For example:

 

Dim docXML

Set docXML = Server.CreateObject("MSXML3.DOMDocument")

Set docXML.schemas =  colSchemas

 

Compiled Stylesheets

In older versions of MSXML, you preformed an XSL transformation with the old parser by calling the transformNode method of the XML document, passing it an instance of the parser containing your stylesheet document. Every time you called transformNode, the stylesheet had to be compiled.

Two of the new interfaces introduced with MSXML 3.0 address this issue. IXMLDOMXSLTemplate is a free-threaded component that compiles stylesheets. IDOMXSLProcessor takes care of transformations with compiled stylesheets.

To use these features, you create a template and save it into the Application object. Whenever you need to perform a transformation, you call the template's createProcessor method to obtain a snapshot of the template in the form of an XSL processor component. As long as your application is running, you don't have to load the stylesheet from disk and compile it again.

 

var xmlDoc = new ActiveXObject("MSXML2.FreeThreadedDOMDocument");

var xslSheet = new ActiveXObject("MSXML2.FreeThreadedDOMDocument");

var xslTemplate = new ActiveXObject("MSXML2.XSLTemplate");

 

Always use the free threaded version of the parser. If you attempt to assign an apartment-threaded component (MSXML2.XMLDOMDocument) to a template, the call will fail. At this point, there are two XML parsers. Then load the document and stylesheet, assigning the component holding the stylesheet to the stylesheet property of the template component:

 

xmlDoc.load(Server.MapPath("manifesto.xml"));

xslSheet.load(Server.MapPath("papersParam.xsl"));

xslTemplate.stylesheet = xslSheet;

 

Finally, create an XSL processor, attach the data document to it as the input to be transformed, and perform the transformation. Any subsequent changes to xslSheet or xslTemplate will not be reflected in the processor:

 

var xslProcessor = xslTemplate.createProcessor();

xslProcessor.input = xmlDoc;

xslProcessor.transform();

Response.Write(xslProcessor.output);

 

Simple API for XML: SAX2

"Traditional" XML parsers parse an entire XML file before you can do anything with the contents. If you're working with large files, this can be costly both in resources and time, particularly when all you want is a single piece of data buried deep within the file. Often, you just need to parse a portion of a file or process a file from start to end, extracting information from specific nodes.

The Simple API for XML, SAX, currently in version 2, takes a whole different approach. Those who like open source software will love SAX. It was developed by participants on the XML-DEV mailing list and placed in the public domain by its primary author.

SAX reads an XML file and raises events as it processes each node. Unlike the XML parser that uses the DOM, SAX does not create the individual node objects nor construct the DOM hierarchy tree in memory. It reads elements, raises events, and discards that element and moves on to the next.

You can monitor the events to extract information from the file. For example, suppose you had an XML file that contained demographic information for 100,000 people, and you need to find one particular piece of information. Parsing the entire file into in-memory objects is overkill and would take forever. With SAX, you would read through the file, monitor the events, and stop processing when you found the required information.

Other Goodies

Microsoft has placed significant resources behind XML, and anytime that's the case they make sure developers have the tools we need to get our work done. Here are a few that are particularly useful, or will be when MSXML 3.0 hits the streets. You can find links to them at http://msdn.microsoft.com/xml/.

XSL to XSLT Converter 1.0

The xsl-xslt-converter.xslt style sheet updates Microsoft Internet Explorer 5 XSL style sheets to XSLT-compliant style sheets. Since stylesheets are XML, you can use XSLT to convert your old stylesheets. The transformation support is incomplete, so your old stylesheets will need some tweaking.

XSL ISAPI Filter 2.0

The XSL ISAPI Filter enables server-side XSL formatting for multiple device-types. It features automatic execution of XSL style sheets on the server, choosing alternate style sheets based on browser type, style-sheet caching for improved server performance, the capability to specify output encodings, and customizable error messages.

XML Validation Tool

This tool is an updated version of the XMLINT command line tool that shipped in the Internet Explorer 4 SDK. It checks that an XML file is well formed. It also uses the new XML DOM to check that the document is valid according to the DTD or XML-Data Schema.

Internet Explorer Tools for Validating XML and Viewing XSLT Output

Currently, when browsing XML files using Internet Explorer, the XML documents are not validated, and when viewing the source of the document, only the XML is returned and there is no way to view the output from the XSL or XSLT style sheet that may have been used to transform that XML document.

The Internet Explorer tools for validating XML and viewing XSLT output enable a shell option when viewing XML files to see the processed XSL output. You can also validate XML against an embedded schema when loading XML via the Internet Explorer MIME viewer. This can be a useful tool when you are trying to debug XSL formatting problems in Internet Explorer or are doing quick schema validation.

Don Kiely is Software Technologist for Third Sector Technologies in Fairbanks, Alaska, which develops custom solutions for government agencies and non-profit organizations. He's written and co-written several books about VB and VC++, including "Visual Basic Programmer's Guide to the Windows Registry". He also teaches VB, XML, and SQL Server courses for AppDev and is a regular speaker at software conferences. Reach him at donkiely@computer.org.

 

  

Rate this article on a scale of 1 to 10

Your vote :  


 

Recent Jobs

A great opportunity to Digital Vide
here is a greate opportunity as a S
A great opportunity as a Network En
A Greate Opportunituy as a SQL Deve
An immediate job opportunity as a B

View all Jobs (Add yours)
View all CV (Add yours)



chicago web site design
teleconferencing service
Host Department NOLIMIT Web Hosting
UK Website Designers
Fendi sunglasses
New Jersey pool contractor
answering service
fax.com
swimming pool builder
MSN
sunglasses


    Email TopXML  

Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP