Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :
07/01/2003
Times viewed :
1701
XML Parsers
In the XML Parsers Zone you will learn about XML Parsers and how to make use of
them in your applications. Please do take a look at the other Learning
Zones we have available.
This is a page to assist you to get a grasp on the many
options you have when choosing a parser.
This is a collection of XML toolsets, parsers and XSLT processors.
Most of the are free and come with the source code under public license.
Some of them work under Windows and many of them are Java based. All
of these links open in a new window.
This little utility is a self-extracting
archive that will automatically download and install MSXML3, the
MSXML3 SDK and Documents, the XSLT Test Tool, Saxon,
Xalan,
Oracle
XSL, Sablotron,
Xt,
Unicorn,
Napa, 4XSLT
and Instant
Saxon. It was built to get people up and running quickly
with various XSL processors and the XSLT Test tool.
The SAXON package is a collection of tools
for processing XML documents. You can use SAXON by writing XSL
stylesheets, by writing Java applications, or by any combination of
the two. The output format may be XML, or HTML, or some other
format such as comma separated values, EDI messages, or data in a
relational database. Maintained by Michael Kay. It can
be relied upon to support the latest standards.
Instant Saxon contains identical
functionality to the full product, but packaged as a Windows
executable for ease of installation and running. This package includes
only basic documentation, and no source code or sample applications.
Xalan is an XSLT processor for transforming
XML documents into HTML, text, or other XML document types. Xalan-Java
version 1.2.2 is a complete and robust implementation of the W3C
Recommendations for XSL Transformations (XSLT) and the XML Path
Language (XPath). Xalan can be used from the command line, in an
applet or a servlet, or as a module in other program. By default, it
uses the Xerces
XML parser, but it can interface to any XML parser that conforms
to the DOM level 2 or SAX level 1 specification.
Xerces (named after the Xerces Blue
butterfly) provides world-class XML parsing and generation.
Fully-validating parsers are available for both Java and C++,
implementing the W3C XML and DOM (Level 1 and 2) standards, as well as
the de facto SAX (version 2) standard. The parsers are highly modular
and configurable. Initial support for XML Schema (draft W3C standard)
is also provided. A Perl wrapper is provided for the C++ version
of Xerces, which allows access to a fully validating DOM XML parser
from Perl. It also provides for full access to Unicode strings, since
Unicode is a key part of the XML standard. A COM wrapper (also
for Xerces-C) provides compatibility with the Microsoft MSXML parser.
Oracle provides a set of XML parsers for
Java, C, C++, and PL/SQL. Each of these parsers is a stand-alone XML
component that parses an XML document (or a standalone DTD) so that it
can be processed by an application. The parsers support the DOM
(Document Object Model) and SAX (Simple API for XML) interfaces, XML
Namespaces, validating and non-validating modes, and XSL
transformations. The parsers are available on all Oracle platforms.
Sablotron is a fast, compact and portable
XSLT processor. Sablotron is an open project; other users and
developers are encouraged to use it or to help us testing or improving
it. The goal of this project is to create a reliable and fast XSLT
processor conforming to the W3C specification, which is available for
public and can be used as a base for multi-platform XML applications.
XP is an XML
1.0 parser written in Java. It is fully conforming: it detects
all non well-formed documents. It is currently not a validating XML
processor. However it can parse all external entities: external DTD
subsets, external parameter entities and external general entities.
Unicorn XML Toolkit is a developer product
implementing various XML-enabling technologies. The Toolkit
implements two sets of API: one for C++ and and one for ECMAScript.
4Suite is a collection of Python tools for
XML processing and object database management. It provides support for
XML parsing, several transient and persistent DOM implementations,
XPath expressions, XPointer, XSLT transforms, XLink, RDF and ODMG
object databases. The quickest path to trying 4Suite out, especially
for non-Python users, is to follow the 4Suite Installation HOW TO,
which is available for UNIX
and Windows
users.
Napa is a high-performance, progressive,
C++ XSLT processor. There are now three distributions available,
Windows, FreeBSD and Linux. All just provide a command line interface
at the moment.
XML Pull Parser 1.1 was designed for and it
should be optimal for applications that require fast and small XML
parser - the jar file with compiled classes is around 20KB. Its
pull parsing model is especially well suited for unmarshalling complex
data structures from XML (such as SOAP).
We'd like to extend the Sniffer
to sniff out all of those parsers. If anyone has the time and
Javascript skills to extend it, we'd really appreciate it!