Schemas
The tools for defining XML vocabularies that you've seen so far in
this book - the basic rules of well-formed XML as well as DTDs - are
the ones provided in the W3C XML 1.0 Recommendation. They are the
fundamental parts of the XML world and define the core power of XML
for application developers, allowing us (and others) to create markup
vocabularies that describe the problem domain we are working in. But
as we get to grips with using these technologies, especially in large
real world applications, it is common to start to wish that we had a
bit more functionality at our disposal.
Before we get stuck into the chapter, let's look at some of the
problems with what we have available. However, don't let these worry
you, as soon as we have seen what we are missing in the first couple
of pages, we will spend the rest of the chapter addressing solutions
to all of the problems we outline. Some of the problems you might
have already come across, and identifying others will prevent you
from spending fruitless hours chasing dead ends.
Firstly, in Chapter 3 we identified some drawbacks with DTDs.
These mainly focused around them being written in a syntax other than
XML (namely Extended Backus Naur Form) and that they are not
expressive enough. So, the first thing to add to our list of problems
will be to re-visit these shortcomings of DTDs and look at some
attempts to solve them. This will also help us focus on some of the
other problems associated with defining your own markup.
Another problem you may have thought of concerns everyone having
the ability to create their own tags. You can easily imagine cases of
people using the same element names to mean different things. For
example, if you consider the use of an element such as monitor, it
could have several meanings in different circumstances. If you had a
DTD for computer peripherals, then monitor may refer to the screen,
while in a music studio speakers are often called monitors. If there
were a school DTD the monitor element may refer to a student who is
given special responsibilities, whereas in a nuclear power plant
monitors may be in place to report faults. Even if the meanings were
the same, the possible content for the elements may change between
definitions. With all of these different potential uses for elements,
we need a way to distinguish the particular use of the element,
especially if we mix different vocabularies in one XML document. To
help solve this problem, there is a specification called XML
Namespaces, from the W3C, which allows you to define the context of
an element within a namespace.
Furthermore, there are likely to be situations where we need to
combine XML documents from different sources, which conform to
different DTDs. This may be when we are describing a large body of
information, and a single DTD could be unwieldy and hard for human
readers to understand, or it could be for an e-commerce application
where we need to combine a business partner's data with our own.
Unfortunately, the XML recommendation provides no way to mix DTDs in
a single document without modifying, or creating a new, DTD (using
external references).
Taking this line further, as more and more industry standard DTDs
are created, there is an increasing chance that someone will already
have created one that will relate to a problem domain you come to
work in. If the existing DTD is not perfect for you to use
immediately, rather than completely creating a new version, it might
be helpful to add your own customizations in a separate DTD, which
would still allow you to exchange a certain subset of information in
the standard format. As we have just suggested, though, we cannot
easily do this with DTDs.
These issues are becoming increasingly important especially
considering the promise that XML offers in the realm of electronic
commerce, where different companies and users will want to exchange
data in formats that make sense to each other. While it is
possible to read a DTD from code and to reconcile the documents,
it is not a simple matter. So, we need some means to discover the
differences and similarities of the competing vocabularies so we can
establish a connection. To this end, the W3C are working on an
alternative type of schema to DTDs, which will be written in XML
called XML Schemas.
This alternative schema language will address these problems, and
a number of other shortcomings of DTDs, which we shall look at later
in the chapter. We will start, however, with a look at some of the
problems with creating single XML documents from multiple
sources.
The problems with creating single documents from sources that are
written according to different DTDs and of different schemas using
the same element names, concern data about vocabularies - how they
are built and where the rules come from. The XML community and its
supporters have been working on these problems, and the results are
coming to fruition just in time to enable an emerging generation of
XML-based electronic commerce. If you are interested in using XML to
connect heterogeneous systems developed by disparate teams, you need
to understand these new extensions to the XML world.
This chapter will cover the results of some of the XML community's
efforts to solve these problems. It will provide you with knowledge
of the two tools: namespaces and XML Schemas. Namespaces help
XML vocabulary designers to break complex problems into smaller
pieces and mix multiple vocabularies as needed to fully describe a
problem in a single XML document. Schemas permit vocabulary designers
to create a more precise definition of the vocabulary than is
possible with DTDs, and do so using XML syntax.
The two tools answer some of the problems that may arise when
using XML to tackle ambitious problems. In particular, namespaces and
schemas allow XML designers and programmers to do the following:
Better organize the vocabularies surrounding a complex problem
Provide a way to retain strong typing of data when converting it
to and from XML
Describe vocabularies with more precision and flexibility than
DTDs permit
"Read" vocabulary rules in XML, permitting access to vocabulary
definitions without increasing parser complexity
XML Namespaces reached W3C Recommendation status on January 14,
1999. Schemas are working their way through the standards process,
but a Recommendation is expected soon. The demand for schemas within
the application development community is so great, however, that
technology previews of schema support are making their way into
shipping parsers. This being the case, the schema draft is well worth
studying in order to be ready for the rapid transition to schemas
expected when a Recommendation is issued.
©1999 Wrox Press Limited,
US and UK.