Recall the Book Catalog DTD that we met in Chapter 3. After
building a site that exposes your book catalogs as XML written to the
PubCatalog.dtd vocabulary, you might decide to sell the books
on-line. This means that you need to be able to take orders for the
books in the catalog. So, you need a DTD that covers ordering of
books.
If you continue in the vein of the DTD chapter, you might start
adding to the PubCatalog.dtd file, because the two areas address
different parts of the same problem domain: sharing data about books.
They can, however, also be seen as different problem domains because
the one addresses the whole of the catalogs details, while the other
addresses sale of items from the catalog. While there is some overlap
in the information conveyed by the two topics, if you tried to use
one DTD to cover both areas, you might end up with one very large or
confusing DTD.
DTDs that are large or that contain information about different
topics are hard for programmers to read and understand. More
importantly, if you're already using the catalog DTD in production,
making changes to it now might affect that application. There is a
better solution, however, merging data that conforms to separate
catalog and order DTDs in a single document using namespaces, so we
should explore this possibility. But first, let's take a closer look
at the problems you face.
Segmenting a Problem
To begin with, why would you want to mix order details with the
catalog DTD? You have at least two areas you're discussing, catalogs
of all books and the sales of individual titles. If you think about
when you're writing a large program, you break the overall problem
into smaller pieces. Modules, classes, components, packages and
functions are some of the constructs programming languages offer for
this purpose. Designing a vocabulary can be seen as a similar problem
to writing a program. You need ways to segment a large problem into
multiple vocabularies. However, the problem we have to overcome isn't
really one of writing individual DTDs to describe multiple
vocabularies, we have already seen how we can do that in Chapter 3.
The real problem lies in integrating the DTDs into the body of one
document if we segment the definitions into catalog and order
DTDs.
Reuse
In our PubCatalog.dtd we made use of the book element. This makes
perfect sense in the way that we have talked about marking up our
data in a way that describes contents of the element. However, as we
are considering taking on-line orders for books, we are likely to
want to use the same element name again, when referring to the book
that a customer wants to order. Indeed, it is likely that the two
would be described differently in the two DTDs. After all, the book
element in the order might be a child of an order element, whereas it
is a child of catalog in the PubCatalog.dtd.
As we have already suggested, this is a problem that will occur
time and again as we create XML vocabularies. When describing real
world concepts, we will continually find that common constructs keep
appearing. After all, complex creations are built from simple
building blocks - color, shape, price, and dimensions, for example -
and simple things don't go undefined for long, so there will be many
instances of element names that already have definitions and content
models.
If either you, or someone else, has already created a DTD that
uses these elements, your task will be made easier by borrowing from
proven DTDs (indeed code to handle constructs defined in your
vocabularies may even be available), this is the concept of
reuse.
If you are programming for a corporation, you may be confronted
with an existing body of DTDs. Borrowing from them can, in fact, make
your life easier; while ignoring them makes everyone else's job
harder as the DTDs represent an intellectual investment in a
particular set of definitions by the
programmers involved. These DTDs describe the business problem as
others know it. As in real life, building on the DTDs related to
books in our example, means that your task is to extend it in a way
that flows naturally from the concepts that are already known and
defined.
Indeed, if you are programming an application that must connect to
an external partner's programs, you have little choice but to reuse
existing concepts. The DTDs already in use form a common language you
need to speak in order to be understood. Whenever concepts already
exist, you should work to be understood in terms of those concepts.
The users of pre-existing definitions have made an effort to develop
and internalize them. Convincing them to adapt to your view of the
problem may be insurmountably difficult. Even if you can accomplish
this feat, additional cost is incurred in terms of building new
definitions and code, or mapping from an existing DTD to your new
one. Reuse saves time, effort, and money.
Ambiguity and Name Collision
Whether you're reusing useful definitions from another designer's
DTD or combining segmented DTDs to create a document describing a
composite problem, you risk the problems of ambiguity and name
collision if the documents you are using feature elements of the same
name. For example, books are a pretty common concept. You can be sure
there are several DTDs that declare a Book element, at least for
publishers and printers, retailers and libraries. A single usage of
the name Book in a document needs resolution to match it with the
proper Book element declaration. In our example Book is a name common
to both catalogs and orders.
A document marked up using the PubCatalog.dtd may include the
following use of the element <Book>:
<Book>
<Title>Professional XML</Title>
<Abstract>Compendium book containing
everything you need to learn to use
XML in your programming solutions today.</Abstract>
<RecSubjCategories>
<Category>XML</Category>
<Category>Programming</Category>
<Category>Internet</Category>
</RecSubjCategories>
</Book>
Whereas an order for a book may require the following use of a
<Book> element:
<Order>
...
payment and shipping information
...
<Item>
<Book>
<Title>Professional XML</Title>
<ISBN>1-861003-11-0</ISBN>
</Book>
<Quantity>3</Quantity>
<Price US$="49.99" />
<Discount US$="10.00"
/>
<SubTotal US$="119.97"
/>
</Item>
</Order>
If I'm reading an XML document that includes data from both of the
vocabularies, how do I know which definition it refers to?
The problem becomes acute when you use instances of a name drawn
from multiple DTDs. Assume we have an application for civil engineers
involved in town planning. When talking about lighting, we want to
draw on pre-existing DTDs for traffic lights and street lights.
Working in isolation, the respective vocabulary designers each chose
the word <Light> as an element name. Had they known of the
eventual use of their DTDs, they might have chosen
<TrafficSignal> and <StreetLamp>, but this future use was
not known at the time the DTDs were written. Now we are faced with
the specter of documents that have ambiguous Light elements.
The declarations for the two uses of <Light> are very
different. The first declaration covers traffic signals and has an
enumeration for its color attribute. This enumeration is very
important, as there are only three valid colors for our traffic
signals. An application can be expected to do some error checking
based on the value of this attribute:
<!ELEMENT Light EMPTY>
<!ATTLIST Light color (red | yellow | green)
#REQUIRED>
The second declaration has no such restriction on its color
attribute's value. Indeed, lamps are often chosen on the basis of
cost, not color, although the color is still specified:
<!ELEMENT Light EMPTY>
<!ATTLIST Light color CDATA #REQUIRED>
Now consider the following XML document written by an application
that mixes the two DTDs:
<Inventory>
<Light color="red"/>
. . .
<Light color="white"/>
...
</Inventory>
From this, we cannot tell whether the Light elements refer to
traffic lights or street lamps (without checking the constraints on
colors implied in the DTD). So, how would a receiving application
know whether the color attribute's values are acceptable? We don't
know which element refers to which DTD, and the value of the second
Light element's color attribute would not be valid for the purposes
of traffic lights. The problem is known as ambiguity for well-formed
documents. Furthermore if the names Light and color required
validation we could make a very big mess of our application, this is
referred to as the problem of name collisions.
XML namespaces are the solution to the problems of ambiguity and
name collisions. According to the W3C's Recommendation 'Namespaces in
XML' (14 January 1999), a namespace is
...a collection of names, identified by a URI reference, which are
used in XML documents as element types and attribute names.
A collection of names that has structure; this sounds like a DTD
and indeed, a DTD can be a namespace. In this case the URI could be
the address of the DTD on your server, for example:
http://www.wrox.com/xmldtds/PubCatalog.dtd
The URI need not be a URL, though. (If you are unsure of the
differences between the two, we describe them shortly.) In this case
the namespace refers to the names used in the PubCatalog.dtd. So, if
we were to link the use of the Book element with this namespace in
some way, we would know that any reference to Book in a document that
was linked with this namespace would refer to the usage as laid out
in our PubCatalog.dtd.
Where a DTD dictates the entire structure of a document (and does
so exclusively), a namespace is no more and no less than a resource
from which we can draw just what definitions we need. Having said
this, a namespace need not be a formal structural definition like a
DTD, and the limited scope of this definition makes namespaces
broadly applicable in XML. If the namespace is a DTD or schema, the
definitions we use must remain consistent with the structure and
syntax specified therein. We are free, however, to use just those
names that we need or desire, and use a namespace as a way of
distinguishing between the uses of an element.
So, in order to use the namespaces effectively in a document that
combines elements from different sources, we need two parts:
A reference to the URI that defines the use of the element
An alias that we can use to identify which namespace our element
is taken from, this takes the form of a prefix for the element (for
example <catalog:Book> where catalog is the alias for the
ambiguous Bookelement.)
Having seen the advantages that Namespaces offer us in XML, we
need to look in more detail at how we actually use them. We will
start by looking at how we declare the namespace in a document, and
then look at how we can use the namespace within the document, ending
up with some examples.
Generally speaking, simple descriptive properties are often
modeled as attributes and that is in fact how namespaces are declared
in XML. There are a few twists and turns, however, so we'll proceed
step by step to learn about what we can specify when we declare a
namespace in an XML document.
Declaring a Namespace
If everyone is going to recognize a namespace declaration when
they see one, we'll need a reserved word for them. The Namespaces
Recommendation gives us xmlns. The value of the attribute is the URI
that uniquely defines the namespace in use. This URI is often a URL
pointing to a DTD, but it doesn't have to be. A URI, managed in such
a way as to uniquely differentiate the namespace, is sufficient. Here
are some simple namespace declarations:
xmlns="http://www.wrox.com/bookdefs/book.dtd"
xmlns="urn:wrox-publishing-orderdefs"
The nomenclature surrounding Web resources can be confusing. A
Uniform Resource Identifier (URI) is a unique name for some resource.
A Uniform Resource Locator (URL) locates the resource in terms of an
access protocol and network location. This first example is a URL
because it allows a browser to retrieve a resource from a particular
location using HTTP. The second example names the resource but
provides no location. The literal urn derives from an effort to
develop permanent URIs.
Since one of our prime motivations for using namespaces was to be
able to mix names from different sources, it might be useful for you
to be able to provide an alias you could use throughout a document
that would refer to the declaration. You do this by appending a colon
and your alias to the xmlns attribute. Thus, the examples above
become:
xmlns:catalog="http://www.wrox.com/bookdefs/PubCatalog.dtd"
xmlns:order="urn:wrox-publishing-sales-orderdefs"
Here the prefix catalog will refer to elements from the
PubCatalog.dtd, while order will refer to elements declared in the
order.dtd. After these declarations appear, we can just use book to
refer to the first namespace declaration, and order to refer to the
other one (without the URI). How we use these declarations and their
aliases lets us provide even more information.
Here are the parts that make up a namespace declaration:
Qualified Names
It does us no good to declare a namespace if we can't tie it to a
specific name we want to use. This is done through the use of
qualified names. This is just what you might suppose it to be - a
name qualified by the namespace from which it is drawn. You create a
qualified name by taking the alias, known properly as a namespace
prefix, and tack it on to the beginning of the name. Going back to
the question of including a Book element in both catalog and ordering
DTDs, assume that we declare a catalog namespace with the prefix
catalog like so
xmlns:catalog="http://www.wrox.com/bookdefs/PubCatalog.dtd"
we can now use the prefix catalog to make it clear which namespace
the element came from. So,
would tell us that the name Book comes from the catalog namespace
declaration. There could be a name Book in the order namespace as
well, yet this qualified name avoids the possibility of ambiguity or
collision. The name Title is unambiguously qualified as coming from a
particular namespace. The namespace prefix is often referred to
simply as the prefix, and the name itself is the base name.
Qualified names can apply to both element and attribute names.
Here's an example that mixes some namespaces:
<catalog:Book order:ISBN="1-861003-11-0">
The element <Book> is drawn from the first namespace we saw
above, while the attribute ISBN is drawn from the order
namespace.
Scope
Namespace declarations have scope in the same way that variable
declarations do in programming languages. This is important because
it is not always the case that namespaces are declared at the
beginning of XML documents, they can be included within a later
section of the document. A namespace declaration therefore applies to
the element in which the declaration appears, as well as children of
that element even if it is not explicitly specified in the element. A
name can refer to a namespace only if it is used within the scope of
the namespace declaration.
However, we will also need to mix namespaces where elements would
otherwise inherit the scope of a namespace, so there are two ways in
which scope can be declared, default and qualified.
Default
As you might suspect, it could quickly get tiresome to have to add
a prefix to every name in a document. In fact, by introducing the
concept of name scope to our tool set, we can dispense with a lot of
prefixes. If we define a default namespace, all unqualified names
within the scope of the declaration are presumed to belong to that
default. So, if you declare a default namespace in the root element,
it is treated as the default namespace for the whole document, and
can only be overridden by a more specific namespace declared within
the document.
We declare a namespace to be the default for some scope by
omitting the prefix declaration.
Here's how you might use this to embed some HTML within an XML
document marked up according to a DTD designed for book content,
called BookContent.dtd:
<Chapter
xmlns="http://www.wrox.com/bookdefs/BookContent.dtd">
<Title number="7">Namespaces and
Schemas</Title>
<Author>I. M. Named</Author>
<Content>
<Paragraph>
Let's have a
table:
<table
xmlns="http://www.w3.org/TR/REC/REC-html40">
<tr>
<td>A tisket</td><td>A tasket</td>
</tr>
<tr>
<td>One fish</td><td>Two fish</td>
</tr>
</table>
</Paragraph>
<Paragraph>This is a very
short paragraph</Paragraph>
</Content>
</Chapter>
The elements <Title>, <Author>, <Content>, and
<Paragraph> and the attribute number come from the default
namespace defined in the <Chapter> element. Within the Chapter
element, however, you can see the table element and its children - tr
and td. These belong to the HTML namespace declared in the table
element. Note that the scope of the HTML namespace declaration in
this example ends when the table element closes. The second
occurrence of Paragraph does not come from the HTML namespace.
When a prefix is declared and then used with a name, the namespace
is explicitly stated. For an unqualified name to be reconciled to a
namespace, a default namespace must have been declared with a scope
that includes the unqualified name (without the prefix).
Qualified
All this is well and good if you can clearly separate your
namespaces. But sometimes, you'll want to sprinkle names from foreign
namespaces throughout a document. You need a finer degree of
granularity. Rather than declaring namespaces all over the place, you
can make use of qualified names. Declare the namespaces you will need
at the beginning of the document and then qualify them at the point
of use.
<Measurements xmlns="urn:mydecs-science-measurements"
xmlns:units="urn:mydecs-science-unitsofmeasure"
xmlns:prop="urn:mydecs-science-thingsmeasured">
<OutsideAir
units:units="Fahrenheit">86</OutsideAir>
<FuelTank>
<prop:Volume
units:units="liters">120</prop:Volume>
<prop:Temperature
units:units="Celsius">20</prop:Temperature>
</FuelTank>
</Measurements>
In the root element, Measurements, I've declared three namespaces.
The default takes care of the elements <OutsideAir>,
<FuelTank>, and <Measurements>. However, I need to
qualify some readings with units of measure, which I've done with the
units namespace and the attribute units:units drawn from that
namespace. Being able to qualify that name is very useful as this
attribute pops up throughout the document. Finally, I needed to
differentiate between some types of measurements, prop:Volume and
prop:Temperature. Although I could have declared the prop namespace
in the <FuelTank> element, I am free to use this namespace
repeatedly (perhaps in a longer document) by declaring the namespace
at the beginning and using qualified names.
Take a closer look at the namespace declarations and compare it to
the namespace declaration in the <Chapter> element of the
preceding section. That declaration was tied to a DTD, potentially
making it possible to validate the names used against the DTD. In
this example, we have unique names, but no DTD URL. Namespaces exist
primarily to organize names into distinct sets and avoid name
collisions. The W3C Namespace Recommendation says nothing about their
use in validation. Indeed, the XML 1.0 Recommendation says nothing
whatsoever about namespaces. The XML Schema effort (which we meet
later) does more, but any current use of namespaces for validation
will strictly remain an artifact of an individual parser's
implementation until XML Schemas are an official W3C
recommendation.
©1999 Wrox Press Limited,
US and UK.