What Is XML and Why Should I Care?
By Tony Stewart
Senior Consultant, Flash Creative
Management, Inc.

A lot has been written about XML, the Extensible Markup
Language, but much of it contains misinformation and hype, and
relatively few developers actually understand the core concepts of
this family of standards.
In this presentation I walk through the key components,
explaining each one and showing how they are relevant to us as
developers and information application designers. By the end of the
session, participants will have a basic understanding of the key
standards - XML, XSL, XML-Schema, DOM, SAX, Namespaces, XLink - and
their related technologies, and will know how these standards are
used in the different categories of XML applications that are being
developed today.
To begin, let's take a look at the meaning of the words that
comprise XML: "Extensible", "Markup" and "Language", though not in
that order.
Markup is a term for metadata, that is, information about
information. It originated long before computers, in the field of
publishing, where publishing markup referred to the tags
inserted into an edited text to tell a processor (human or machine)
what to do with the information.
In this sense HTML is a classic markup language. In the phrase
"This word is <bold>bold</bold>", for
example, the HTML <bold> tags tell the browser how to
display the information between them.
Language refers to agreement within a community on the
meaning and use of a set of words. For example: English or
Spanish.
Putting these two definitions together, we find that a
markup language refers to agreement within a community
on the meaning and use of a set of tags. Again, HTML is a perfect
example of this. The community of people writing and using HTML
know pretty much what to expect when they insert an
<emph> or <H1> element into their text.
An extensible language is a language that includes
a mechanism to add words in such a way that others understand them.
We don't really have a corollary to this in the real world - adding
words to English is an evolutionary process, not something any one
of us can do on our own.
Following the logic of these definitions, an extensible
markup language should be a markup language
that provides a mechanism for adding more words to it.
Unfortunately, that's not what the phrase means, and this has
confused a lot of people.
In practice, "Extensible Markup Language" actually means: A
system for defining entire markup languages, including the ability
to extend existing ones.
SGML, HTML and XML
SGML (Standard Generalized Markup Language) was the original
extensible markup language. It was standardized in 1988.
HTML (HyperText Markup Language) is a language that was defined
using SGML. HTML is a "markup language" as you would expect: a set
of tags to control formatting and behavior in on-line
documents.
So here we have two acronyms, both containing "Markup Language"
in their names, which actually are different things. One is a
language; the other is a system for defining languages. What does
this imply about XML?
XML is not a markup language like HTML. Instead, it is a subset
of SGML: a mechanism for defining markup languages that provides
80% of SGML's power for approximately 20% of the pain. So, like
SGML it is an "extensible markup language", but unlike SGML it is
optimized for use over the Web. This has turned out to be an
incredibly powerful combination.
The Core Concept
To repeat the key point, XML is not a markup language.
Instead, it's a mechanism for creating your own markup languages,
and a set of standards to ensure their interoperability, stability
and longevity.
But if XML is not a markup language, neither is it a programming
language. We use XML to represent data, but having done so, we
still have to write applications to process that data. As Jon Bosak
of Sun Microsystems, one of the inventors of XML, said early on:
"XML gives Java something to do."
The primary uses of XML are:
- Exchanging information between heterogeneous
applications, enterprises, databases, etc.
- Enabling styling and presentation of the same
information on multiple output devices and/or for different
purposes and audiences
- As a storage format for long-lived or structurally
rigorous document-centric information, such as aircraft manuals or
enterprise information models.
While the name "XML" apples to a specific international
standard, it is also commonly used to refer to the entire family of
XML-related standards.
Most of us don't deal directly with the XML standards. They are
primarily used by developers who build software tools that generate
or process XML documents, in order to ensure that the tools work
properly and will be interoperable. But it's important to know
which standards are available to us as resources, and where to turn
when we have questions.
One way to organize the standards is according to the goals they
help accomplish. Here are the primary XML standards that have
already been finished (or nearly so) as of this writing:
Goal
|
Primary Applicable Standards
|
Define an XML language
|
XML
Namespaces
XML-Schemas
|
|
Format and display XML documents
|
CSS (Cascading Style Sheets)
XSL (Extensible Style Language)
XSLT (XSL Transformations)
|
|
Develop processing applications
|
DOM (Document Object Model)
SAX (Simple API for XML)
XSLT
|
|
Exchange information between systems
|
Purpose-specific standardized XML languages such as:
- SOAP (Simple Object Access Protocol)
- SVG (Scalable Vector Graphics)
- WML (Wireless Markup Language)
- XCBL (XML Common Business Library)
|
In the rest of this presentation I'll take a closer look at how
these standards fit together.
Document Types and Instances
(NOTE: In this section I use the term "document" in the
XML sense: a series of characters conforming to the XML rules. This
can be confusing, since later we will talk about "documents" of the
usual sort: information formatted for presentation to human
beings.)
XML lets you define the markup to be used in a set of
documents that share similar characteristics: for example, for
example, a set of software manuals or a set of e-commerce messages.
These are called document types. (An XML document type is
roughly analogous to a class in object-oriented design.)
A document instance is a particular document: a specific
software manual or a specific invoice. (This is roughly analogous
to an instantiated object in the OO world.)
In a typical XML project we first design document types for the
information we want to work with, then we write software to create
and process instances of those documents.
Markup Building Blocks
XML provides a powerful set of low-level building blocks that we
can use when designing our document types. Let's take a look at a
sample instance, in this case a fragment of data from a personnel
database.
<personnel-data>
<person
ID="PE1">
<name>
<first-name>Tony</first-name>
<last-name>Stewart</last-name>
</name>
<working-location office-id="OF1"/>
<title>Senior Consultant</title>
</person>
<office
ID="OF1">
<name>Head Office</name>
<address>433 Hackensack Avenue</address>
</office>
</personnel-data>
This fragment illustrates several of the key building blocks of
XML:
- The document consists of elements (the <tags> in
brackets), which are roughly analogous to objects in an OO system,
or to fields in a relational database. An element begins with the
opening bracket of its start tag, ends with the closing bracket of
its end tag, and includes everything in between.
- An element can have content, which is the text between
the opening and closing tags. For example, "Tony" and "Stewart" are
both element contents.
- Some elements contain attributes, which are additional
information stored inside the opening tag of the element in the
form of name:value pairs. ID and office-ID in this
example are both attributes, and their contents ("PE1" and "OF1")
are referred to as attribute values.
- An element can be empty. In this example,
<working-location/> is an empty element. Usually, empty
elements serve as placeholders for attributes.
- Elements can contain other elements. This is referred to as
nesting or containership. Containership can be used to
represent serialized collections of objects or rows of data, or for
any other appropriate information.
- Attributes, however, cannot contain other attributes or
elements.
- Finally, element content or attribute values can serve as
pointers to other items in the document. XML offers a number of
ways to do this, one of which is shown in this example: the
office-id attribute "OF1" inside
<working-location/>matches (points to) the ID
attribute of the <office/> element below it. This pair
of pointers indicates that person "PE1"'s office location is office
"OF1", the Head Office.
Select the Appropriate Modeling Style
These building blocks are simple, yet flexible and powerful
enough to support many different information-modeling styles:
object, networked, hierarchical, relational, etc. You can select
the style that best suits the type of information you want to
represent as XML.
For example, here is a fragment of an XML document containing
similar personnel-related information. This time the data has been
modeled using a different style, one that relies much more heavily
on pointers than containership:
<document>
<people>
<person ID="PE1"/>
<person ID="TS23"/>
<person ID="SS9"/
</people>
<offices>
<office ID="OF1"/>
<office ID="OF2"/>
</offices>
<notes>You can
mix <glref term="structured"> structured</glref> and
<glref term="unstructured"> unstructured</glref> text
in the same document.
</notes>
</document>
The point here is that there is no "right" way to design an XML
document: it all depends on the information you are dealing with
and the processing you want to apply to it.
Well-Formed and Valid
XML allows you to work formally or informally. For small
projects or when prototyping, you can quickly develop
well-formed documents. On larger projects or projects involving
multiple systems, you will usually go further and create
valid documents.
Well-formed XML conforms to a set of built-in structural
rules, including:
- One unique "root" element
- Every non-empty element has matching start and end tags
- All elements neatly nested, with no overlaps
- Various character and name restrictions
Valid XML is well-formed and:
- References or includes a schema or DTD (Document Type
Definition)
- Conforms to the rules in that schema
Schemas Provide Validity
The word "schema" refers to the rules applying to a set of
similarly structured documents. This is not an XML concept: we use
the same word in many other disciplines.
In the case of XML, these rules include:
- What elements and attributes may occur?
- In what sequences and nestings?
- What kind of data can they contain (for example, datatypes,
ranges, character masks, etc.)?
XML provides two schema languages: DTD and XML-Schema.
DTD (or Document Type Definition) is the schema mechanism
invented originally for SGML and inherited by XML. DTDs are
relatively document-centric, so they don't include a lot of useful
features such as datatyping, ranges and picture masks. Also, they
are written in a syntax all their own, and there are relatively few
tools that can process them
XML-Schema is a new schema standard that has been designed
specifically for XML. It uses XML syntax, it addresses most of the
shortcomings of the DTD format, and the major tools vendors are
already shipping technology to support it. As a result, people just
arriving in the XML world are advised to ignore the DTD syntax if
possible and adopt the XML-Schema standard for their work.
When Is Validity Necessary?
Well-formed documents are quick to prototype and easy to use.
I've seen entire applications built entirely around well-formed XML
documents. So when is it worth the trouble to go further and set up
a formal schema?
The short answer is that schemas empower data-driven
processing. They provide the missing information that, when
processed by a schema processor, can be converted into
automated:
- Content validation
- Default values for elements
- Assistance when editing an XML document
- Translation from one XML format to another
Of course you can write code that will do all these things, but
your code will usually be specific to a particular document type.
The information in a schema allows you to purchase or write a
generic schema processor (or a tool containing a schema processor)
that will run against many different document types with no further
programming required from you.
And, of course, schemas document your information design in a
standardized format that is easily shared.
Namespaces Resolve Conflicts
As you work with XML documents and applications, you frequently
find yourself combining portions of what were originally separate
documents into one document. As soon as you do this you start
running into name conflicts: elements with the same tag names that
actually have different meanings.
For example, you might want to merge a document in which
<title> to refer to a person's job title with another
document where <title> refers to move titles. This
doesn't matter when they're in separate documents, and it even
doesn't matter if they're in the same document but you don't intend
to process these tags automatically. The problem occurs when you
want to write code that will automatically do some kind of
processing on all the <title> elements, and it needs
to know which ones are which. (In fact this is a more common
requirement than this trivial example suggests.)
The solution is to use XML Namespaces. A namespace is a
mechanism by which you can declare within a document that a set of
elements are conceptually related, usually by identifying the
"space" from which they originally came. You can declare a short
namespace identifier for each namespace, and then you use that
identifier as a prefix on the element names in your document.
For example, once you have declared a namespace such as
xmlns:flash="www.flashcreative.com/widgets", then each time you
reference <flash:widget> in the document, you (and
your processing software) will know that it comes from the
flashcreative/widget namespace, and therefore is not the same as
<ibm:widget> or even plain-vanilla
<widget>, should either of these appear in the
document.
Summary: Defining XML languages
The three standards that you rely on when defining XML document
types are XML itself, Namespaces and XML-Schema (or the DTD
language in some cases).
The normal procedure is:
- Model your information
- Prototype using well-formed XML
- Optionally define a formal schema
- Use namespaces to combine information from different
sources
NOTE: In this section of the presentation I use the term
"publish" to refer to the process of formatting XML for display to
human beings, and "document" to refer to the output of the
publishing process.
The Problem with HTML
I find it easiest to explore the concepts involved in XML
publishing by looking at the problem it was originally intended to
solve: implementing a better alternative to HTML. To illustrate the
issues, here is some representative HTML, in this case the title
page of the book Captain Corelli's Mandolin:
<H1 ALIGN="CENTER">Captain Corelli’s
Mandolin</H1>
<P ALIGN="CENTER"><I>Louis de
Bernières</I></P>
<P ALIGN="RIGHT"><FONT SIZE="-1">© 1994 by
<A HREF=mailto:ldb@lb.com>Louis de
Bernières</A> </FONT></P>
<HR><P ALIGN="CENTER">To my mother and
father</P><HR>
<H2>Dr Iannis Commences His History and is Frustrated
</H2>
<P>Dr Iannis had enjoyed a satisfactory day...
If we want to understand how a browser sees this information, we
have to remove the text and look at the tags (which are the only
parts of the document that the browser pays any attention to):
<H1 ALIGN="..."> </H1>
<P ALIGN="..."> <I> </I></P>
<P><Font>...<A>...</A></Font></P>
<HR><P>...</P></HR>
<H2>...</H2>
<P>...
As you can see, HTML documents like this contain tags such as
<H1> and <P> to convey structure,
<I> and <Font> to convey formatting, but
there are no tags at all to indicate what information this document
contains. A computer looking at these tags can't easily tell the
difference (for example) between the tags containing the name of
the author, the name of the book, or the dedication.
The result is that a computer can interpret these tags well
enough to display the information, but it can't
process it in any other, more interesting way.
Enter XML
Here is the same information converted into an XML document:
<book title="Captain Corelli's Mandolin" author="Louis de
Berni貥s" crdate="1994" crby="author">
<dedication
ID="c0p1">To my mother and father</dedication>
<chapter ID="c1"
title="Dr Iannis Commences His History and is Frustrated">
<para ID="c1p1">Dr Iannis had enjoyed a satisfactory
day…</para>
...
</chapter>
</book>
This time, if we look just at the tags, we get a much richer
understanding of the document:
<book title="..." author="..." crDate"..." crBy="...">
<dedication>...</dedication>
<chapter ID="..." title="...">
<para>...</para>
</chapter>
</book>
These tags clearly convey both the structure and contents of the
information. We can find the book's title, the author's name, and
the dedication with no trouble. We can break it down into chapters
and paragraphs. As a result, we can easily write software to
manipulate this information any way we want.
There's just one problem: nothing in these tags tells the
computer how to format the book so people can read it. We
have lost exactly the information that HTML provided. We need
something more.
XML Style Languages
The solution is to store the formatting in a separate document
called a stylesheet, then to use a stylesheet
processor to merge the XML information with the formatting in
order to publish a new, human-readable document.
Stylesheets are written in style languages. You could develop
your own style language and style processor, but XML comes with two
standardized style languages, which are supported by a growing
number of free or inexpensive tools.
CSS (Cascading Style Sheets)
CSS is a style language invented for HTML that works very well
with XML. CSS is an excellent mechanism for displaying an XML
document in a browser. Most of the major Web design tools allow you
to draw the page and then generate a CSS script to achieve that
design, so it's easy to use, too.
However, CSS has two significant drawbacks. First, it cannot
produce good-quality printed output. Second, it can only
"decorate" your document; it cannot change the sequence of the
information in the document. Often the contents of an XML document
are in a different order from the way you want them to appear.
Before you can use CSS to publish such a document, you need to
transform it into a sequence that matches the desired output. This
adds another step to the process.
XSL (Extensible Style Language)
XSL is an XML-specific style language that overcomes the
limitations of CSS and provides a lot more. In the long run this is
going to be great. But as I write this (August 2000), it has a
major drawback: there are no tools available that will generate the
XSL stylesheets for you. So unlike CSS, you actually have to write
your XSL stylesheets by hand. The result is a cost-benefit
tradeoff: CSS is easier to use but less flexible; XSL is much more
powerful, but for now, a lot harder to use.
XSL provides three main features, each of which is associated
with a separate standard. Two of these are ready now, and one is in
the standardization home stretch:
- Transformation (XSLT). This feature lets us transform an XML
document into another format, either another XML document or a
DHTML or XHTML document (well-formed HTML that will work in current
browsers, complete with embedded JavaScript or VBScript, if
appropriate.).
- Pointing (XPath) allows us to unambiguously identify any
location or type of location in an XML document. This is a core
requirement of style sheet processing, since it provides the
mechanism by which style rules can be applied to the information in
the XML document without having to actually embed style tags in the
document.
- Formatting (XSL) is the process of applying formatting to
information without having to write instructions that are specific
to a particular output device. For example, using XSL designers
will be able to create rules like "all titles should be formatted
in Arial 24 boldface and centered in a double-line box that is n
units wide", and have the rule apply with appropriate
adjustments whether printing on paper or displaying in a browser.
This will be a great improvement over current mechanisms, which
require that you write different rules (in different style
languages, such as HTML) for each device you want to support.
Unfortunately, the XSL standard is still being developed, and as a
result vendors have not yet provided tools for it.
How Stylesheets Work
The main principle of all XML style languages is that the
designer creates rules that bind formatting and processing
instructions to types of information in the document. The
rules are placed in a stylesheet in the form of templates that
reference (or point to) elements or patterns of information
that can be found in the document.
At runtime, a piece of software called a stylesheet processor
receives the XML document and an associated stylesheet as input.
(Stylesheet processors can be found inside any application that
applies stylesheets to XML, such as browsers and Web page design
tools.) The processor acts on the instructions contained in the
stylesheet, applies the templates to the types of information they
point to, and generates a new output document as a result.
This technique, which is called "declarative" rather than
"procedural" programming (because you declare style rules rather
than writing procedural instructions in source code), is powerful,
efficient, and scales well across large information sets.
XSLT Example
Here is a simple example: an XSLT template to boldface
<title> elements when generating HTML:
<xsl:template match="title">
<H1>
<xsl:apply-templates/>
</H1>
</xsl:template>
In this template, the match attribute value "title"
indicates that the rule should be applied to any
<title> element that the stylesheet processor encounters
in the source XML document. The template rule here indicates that
the contents of the <title> element in the XML
document should be included in the HTML file that is being created,
but preceded by an <H1> tag and followed by an
</H1> tag. Assuming that the source XML looks like
this:
<title>Captain Corelli's Mandolin</title>
then the resulting HTML will look like this:
<H1>Captain Corelli's Mandolin</H1>
This is an extremely simple example, which could easily have
been accomplished using a CSS stylesheet. In practice XSLT pattern
matching can be much more complicated and powerful than this, and
can you do more with XSLT templates than merely apply
formatting.
Stylesheets Enable Flexibility
By separating your formatting instructions from the information
contents and then using the stylesheet mechanism to merge them back
together, you open up a world of possibilities:
- You can create different stylesheets for different devices
(browsers, PDAs, telephones, etc.), different media (on-line,
print, CD, etc.) or different purposes (manager's view, technical
view, etc.).
- You can create one stylesheet for many documents.
- You can edit the information and the stylesheet separately from
each other.
- You can republish the entire information set at the touch of a
button.
Summary: Publishing XML
The core standards that you rely on when publishing XML
documents are CSS and XSL/XSLT.
The key concepts are:
- Separate content from formatting
- Bind styles to structure
- Use CSS to "decorate" your information
- Use XSLT when transformations or more powerful pattern matching
is required
The previous sections have described ways to use the XML family
of standards and tools to perform data-driven activities such as
validating XML (against a schema) or publishing it (using a
stylesheet and off-the-shelf tools). This section describes the
standards you would use you writing custom code or scripts to
process XML information in other ways.
Developers tend to work with XML at one of two levels. Low-level
processing involves reading the raw XML document, breaking it into
pieces and doing interesting things with the pieces. In high-level
processing you rely on off-the-shelf tools to do the low-level work
for you, but you still have to write code or script to customize
those low-level tasks and chain them together. For low-level
development you need to focus on the XML parsing standards,
while for high-level development you should focus on the XML
transformation standard.
Parsers
Every XML processor (including browsers, schema processors,
editors, stylsheet processors, etc.) contains a parser.
The parser reads the XML document and breaks it into pieces in
memory, usually corresponding to the elements and attributes of the
original document. Once this is done, the processor can manipulate
the pieces directly, just like data in a database:
- Transforming it to/from other formats
- Reassembling it in a different sequence
- Applying formatting for printing or display
Only a small subset of XML developers needs to worry about
parsers and parsing standards. Most of us rely on the parsers that
are built into our favorite development environments, and leave the
low-level details to the tool vendors.
The XML parsing standards help all of us, however, because they
ensure that the available parsers contain the same core features
and language bindings, and are therefore relatively
interchangeable. This means that a developer can switch parsers
without breaking his application, or can move to another
development environment or programming language without losing
functionality. Since different parsers and development environments
have different strengths and weaknesses, this allows us to select
the best tools for the job at hand.
Two Parsing Standards
XML comes with two standardized APIs (Application Programming
Interfaces) for writing parsers: DOM and SAX.
Despite its name, the DOM (Document Object Model) is not really
a "model" at all. Rather, it is an API for writing code to
manipulate with the pieces of a document after that document has
been loaded into memory. The DOM standard treats the information as
if it has been stored in a tree structure, with commands to walk
the tree, retrieve collections of information, etc. These
commands certainly imply the existence of an underlying object
model, but for practical purposes the benefit of DOM-compliant
parsers is that they provide standardized API calls to manipulate
the information.
SAX (Simple API for XML) does not pretend to be an object model.
Instead, it's a set of events that can be fired while a document is
being read by a SAX-compliant parser. SAX was created because DOM
commands really can't be used until the entire document has been
loaded into memory. But what if the document is very large, and all
you care about are the contents of a <widget> element
somewhere in the middle? You don't want to waste time and memory
loading the whole document into memory just to access one element.
Instead, you can quickly stream the document through a
SAX-compliant parser to find the elements you care about and grab
their contents.
In a sense, the SAX functionality is a prerequisite for a
DOM-compliant parser, because a DOM parser must first read the
document and use SAX-like events to identify the elements and build
its in-memory representation of the document. As a result, many
parsers are both DOM and SAX compliant, so that you can use the
same parser in either processing mode.
Transformations (XSLT)
A transformation is the process of using XSLT to convert an XML
document to another format, either XML or non-XML.
Transformations are the building blocks of XML applications,
because they allow us to work with information in an optimized
form. As Simon Phipps of IBM said at a conference last year:
"The lifeblood of all business to business in XML is going to be
transformation."
This has certainly been true in my own work. In a project I
recently worked on, the heart of the application was a series of
XSLT transformations between three formats: an "external" format
that was used for interchange between organizations, an "internal"
format that was used to support processing, and an HTML format that
was used to display the information. Once these formats and
transformation rules had been worked out, the rest of the
development process consisted of creating generic, reusable
routines to route the information and chain the
transformations.
XSLT Concepts
An XSLT stylesheet consists of named templates and
patterns. Templates define the string/structure transformations
that need to occur, while patterns bind the templates to structures
in the document.
The XSLT processing model is declarative and recursive. The
stylesheet processor walks through the parsed XML document, and at
each node it applies the templates whose patterns match the current
location. The instructions in the templates can rearrange or sort
the information, wrap other character strings around it, skip it
entirely, or do whatever else is necessary to transform it into the
desired output.
Here is a simple example of a template to output an HTML
document with "last names first". Assume that the source XML
contains two <author> elements like this:
<author>
<firstname>Alan</firstname><lastname>Griver</surname>
</author>
<author>
<firstname>Tony</firstname><lastname>Stewart</surname>
</author>
And assume that the stylesheet contains a template with a
pattern match for <author> elements, like this:
<xsl:template match="author">
<xsl:value-of
select="lastname"/>, <xsl:value-of
select="firstname"/>
<BR/>
</xsl:template>
As the stylesheet processor walks through the document, each
time it encounters a new element it will look see whether there is
a template whose pattern matches the point it has reached in the
document. At each <author> element the processor will
see that there is a match, so it will evaluate the appropriate
style template and add the results of that evaluation to the output
that it is building. In this case the combination of the
expressions in the template (the two <xsl:value-of>
elements) and the literals (the comma, the space and the
<BR/>) will generate a text string that looks like this:
Griver, Alan<BR/>Stewart, Tony<BR/>.
A Glimpse of the Future
The XSLT processing model is incredibly powerful and scales very
well. Many XML B2B applications are based on multiple
transformations from one XML format to another, and XSLT is seen as
the obvious way to accomplish this.
The problem up until now is that we've had to design and write
the XSLT instructions by hand. But a new generation of schema
processing tools that automate the process of designing
schema-to-schema transformations is about to be released.
The core standards for parsing and processing XML documents are
DOM, SAX and XSLT.
The key concepts are:
- Standardized parser APIs enable interchangeable XML
processors
- Transformations are the building blocks of XML
applications
- Schema mapping tools will generate the necessary XSLT
Hopefully by now you understand what XML is and how the main XML
standards fit together. In this section I want to suggest four
reasons why XML is achieving such widespread adoption, and why I
believe it will change the way we design and develop information
architectures.
Loose Coupling
Loose coupling is the process of communicating between systems
or components using predefined message formats, without knowing the
details of each other's systems. The basic concept is: "If you send
a message like this to my address, I'll send a message like
that to your address."
Loose coupling addresses many of the difficulties we have
experienced over the years when we tried to integrate systems using
data exchange formats that were tightly bound to the systems they
connected.
XML adoption enables loose coupling on a large scale. Because
the XML message format is independent of the processing software,
you can add more systems to a network, and change the nature of
those systems, without changing the message mechanism. And because
XML allows you to "extend" the message format without breaking
existing processing routines, you can improve your format
incrementally over time without disrupting existing participants'
systems.
The result is that XML is being widely and rapidly adopted as
the medium of choice for both business-to-business (B2B) and
object-to-object messaging. Around the world, organizations that
need to exchange data with each other are joining together to
create business communities, defining XML languages that represent
the data they need to exchange, and publishing schemas for those
languages at central Web sites.
You can learn more about these concepts and the communities that
are adopting them at two Web sites that are focal points for this
activity: www.biztalk.org and
www.xml.org.
Self-Describing Data
An XML document always contains its element/column/object names,
and optionally includes or references its entire schema. This means
it is "self-describing" in a way that no widely-used data format
has been before.
This in turn enables client-side processing of the XML.
When you send XML to a device, software on that device can both
display and process the information.
This combination of self-describing data and client-side
processing is already leading to incredible browser applications
and a new generation of "smart" documents, while requiring
significantly less programming effort than before. (Yes, that's
hype, but it is also my direct experience.)
Standards-Based
The fact that XML is based on international standards makes it
stable, interoperable and extensible. You can:
- Easily exchange information between heterogeneous systems
- Choose between competing tools
- Build and reuse generic processing routines
- Add features without creating legacy compatibility
problems
And because the standards groups are protecting
backwards-compatibility, we can expect these benefits to remain
stable for years to come.
A Linga Franca for Information
The final and most compelling reason for the widespread adoption
of XML is that it works for so many different kinds of information
and purposes.
Originally invented to provide the benefits of SGML on the Web -
and (its founders hoped) to replace HTML in the process) - XML has
turned out to be equally useful for both Web and non-Web
applications in a variety of domains including:
- Information exchange
- System integration
- Distributed computing
- Local metadata storage
All This, and Unicode Too!
XML uses the Unicode character set, allowing over 64,000
characters and accommodating most of the world's languages and
character sets.
What is XML?
- A set of standards and tools to define, exchange and publish
information
Why should I care?
- Loose coupling
- Separation of format from content
- Self-describing data
- Client-side processing
- Standards-based
- Lingua Franca
For more information, check out these Web sites.
Articles and discussions about XML:
Standards and related information:
Schemas and business communities: