Let's try to markup the content of this book and see if we can use
our new tool, namespaces, in a useful way. Assume a content DTD has
been created, as in Chapter 3. We'll borrow names from the existing
catalog DTD. Rather than recreate markup that exists in HTML, we'll
borrow from that namespace, as well. We'll leave aside the issues of
validation for now and assume this document need only be well-formed.
Pay close attention to the scoping issues. Here's a start on marking
up this book, showing the start of this chapter:
<Book xmlns="urn:wrox-pubdecs-content"
xmlns:cat="urn:wrox-pubdecs-catalog"
cat:ISBN="1-861003-11-0"
cat:level="Professional"
cat:pubdate="1999-11-01"
cat:thread="WebDev"
cat:pagecount="450">
<cat:Title>Professional XML</cat:Title>
<cat:Abstract>The W3C positions on namespaces and
schemas are
presented, together with a review of commercial
support.</cat:Abstract>
<Author>
<FirstName>Iye</FirstName>
<MI>M</MI>
<LastName>Named</LastName>
<Biographical>
Iye M. Named is a researcher with the
Adaptive Content
division of Wrox Press. He has many
good ideas, which he
is too shy to mention.
</Biographical>
<Portrait piclink="inamed.jpg"/>
</Author>
<Chapter>
<Title>Namespaces and
Schemas</Title>
<Section SectionAuthor="inamed">
<Paragraph> The tools for
defining XML vocabularies that you've seen so far in this book -
the basic rules of well-formed XML as well as DTDs - are the ones
provided in the W3C XML 1.0 Recommendation...
</Paragraph>
<Paragraph>Both problems
...</Paragraph>
<Paragraph>This chapter
...</Paragraph>
<Paragraph>The two ...
<UL
xmlns="http://www.w3.org/TR/REC/REC-html40">
<LI>Better
organize...</LI>
<LI>Provide...</LI>
<LI>Describe
vocabularies...</LI>
<LI>"Read"
vocabulary rules...</LI>
</UL>
</Paragraph>
<Paragraph>XML
Namespaces...</Paragraph>
</Section>
<Section SectionAuthor="imnamed">
<Title>Mixing
Vocabularies</Title>
<Paragraph>Recall the Book
Catalog DTD...</Paragraph>
...
</Section>
...
</Chapter>
...
</Book>
I declared two namespaces in the root element. The content
namespace is the default as I expect to rely heavily on that
namespace and want to qualify as few names as possible. I found it
useful to borrow a few names from the catalog namespace, so I
declared that namespace with the prefix cat. This allowed me to bring
in some attributes from the catalog namespace and include them in the
root element, which is drawn from the default content namespace.
Later, I needed to include a bulleted list. This is well-established
in HTML, so I declared another namespace:
<UL xmlns="http://www.w3.org/TR/REC/REC-html40">
I haven't provided a prefix, so HTML becomes the default
namespace, but only for the UL element and its children, the list
items (LI). As soon as we emerge from that scope, with the closing
tag for the UL element, we revert to the content namespace as our
default.
I started this example by advising you that this is a well-formed
example. Indeed, if I provided URLs in the namespace declarations
that pointed to DTDs and asked you to run it through a validating
parser, you'd be in for something of a shock. The XML 1.0
Recommendation makes no provision for more than one DTD per document.
Here, although the DTDs are being used as unique names they are not
being read for validation, the original DTD has no concept of the
names from the HTML namespace. As soon as you try to bring in a
foreign name, the parser will indicate an error as the element or
attribute you have brought in is not permissible under the first DTD.
I hope I've shown you that namespaces are useful. Validation is
useful, too. Reconciling the two is just one of the benefits of XML
schemas.
The first thing to make clear is that a DTD is actually a type of
schema. However, when people in the XML community refer to schemas
they often mean a replacement for DTDs written in XML syntax, a term
that we use in this chapter. There have been a number of proposals
for alternatives to DTDs, and the W3C is currently working on
creating a standard alternative drawing inspiration from these
efforts. In a sense we can think of schemas as a constraint
mechanism, in that, while they declare the allowed elements,
attributes, etc. we are constraining the users choice of tags and
their content models.
Generically, we can refer to schemas as metadata, or data about
data, and as we shall see some of the schema efforts are not just
concerned with defining a vocabulary, they go beyond this attempting
to explain the relationships between certain types of data.
If you want to replace DTDs, you need to offer at least the same
abilities as they provide. You need to specify the nature and
structure of XML documents. Like a DTD, a schema is a description of
the components and rules of an XML vocabulary. Schemas refine DTDs by
permitting more precision in expressing some concepts in the
vocabulary. In addition, schemas make some radical changes. They use
a wholly different syntax than DTDs. They permit us to borrow from
other schemas, thereby solving the validation problem you encountered
in the final namespaces example. They offer datatyping of elements
and attributes. Overall schemas really are a better answer to the
problem of specifying vocabularies.
XML has done well with DTDs. At the same time, there has been
considerable interest in improving on them. This interest has taken
many forms with many proposals having been suggested (several of
which are available from the W3C site as notes). While this has made
for a richer body of work, it has also delayed the adoption of a
Recommendation that covers the most common features desired of
schemas. In particular, many developers have wanted strong typing,
the ability to validate across multiple namespaces, and the use of
XML syntax for some time. Fortunately, that situation is now being
resolved. As of this writing (January 2000), the W3C Working Group on
Schemas is well on the way to reconciling the many contributing
proposals for a schema language into a single, useful specification.
The improvements schemas bring, as we will see shortly, are of
enormous value in enabling the automated exchange of XML
documents.
You may have invested a lot in learning the syntax and rules of
DTDs, and the lack of a schema specification shouldn't prevent you
from exploring the many avenues of XML and working with some
interesting examples. So you might wonder what's so wrong with DTDs
that you have to learn a new method. Firstly, it is well worth
learning DTDs because (at the time of writing) they provide the only
standard for describing your own markup. In addition, there are many
markup languages that have already been defined using DTDs, and the
ability to read them is very helpful for adopting the markup.
However, as we suggested in Chapter 3, DTDs have a few
shortcomings that become apparent as we try to do more with XML:
they are difficult to write and understand
programmatic processing of their metadata is difficult
they are not extensible
they do not provide support for namespaces
there is no support for datatypes
there is no support for inheritance
Let's take a look at each of these problems in turn.
DTDs are Difficult to Write and Understand
DTDs use a syntax other than XML, namely Extended Backus Naur Form
(EBNF), and many people find it difficult to read and use. The
proposed XML schemas, however, actually use XML to describe the
languages they define, removing the difficulty of learning EBNF
before learning to read and write them.
Programmatic Processing Of Metadata Is Difficult
The use of EBNF also makes the automated processing of metadata in
DTDs difficult. There are, of course, parsers for DTDs. You probably
already have one; it's your favorite validating parser. Validating
parsers have to load and read a DTD before they can validate a
conforming document. However, it is not possible to inquire into the
DTD from a program using the DOM. The DOM makes no provision for
gaining access to a vocabulary's metadata written in EBNF. Your
validating parser reads the DTD and keeps its information to itself.
Wouldn't it be nice if DTDs were written in XML so we could explore
them as easily as we explore the documents written according to their
rules? That feature would allow us to use the DOM to investigate the
structure of newly encountered vocabularies or even modify a
vocabulary's rules for validation depending on runtime
conditions.
DTDs Are Not Extensible And Do Not Provide Support For
Namespaces
As we've seen in our examination of namespaces, a DTD is
it. All rules in a vocabulary must exist in the DTD. You put
everything you need into the DTD and you live with it. You can't
borrow from other sources without creating external entities.
Having written our catalog.dtd, should you want to add a new
section to the code, say for a new <releaseDate> element, the
whole DTD would have to be re-written. Even if you did just copy and
paste the majority of it, you would have to be careful to make sure
that your existing documents were still valid.
Furthermore, creating and maintaining your own subsets of markup
declarations isn't as flexible as simply referring to an existing
definition. You can't permit document authors to include something
interesting later that isn't found in the DTD. Of course, we don't
always want to give document authors this much freedom, but it would
be nice to have the option of using parts of an existing schema when
designing a new vocabulary.
Again, because all rules in a vocabulary must exist in the DTD, as
we have seen you cannot mix namespaces. While you can use a namespace
to introduce an element type into a document, you cannot use a
namespace to refer to an element declaration in a DTD. If a namespace
is used all elements from the namespace must be declared in the
DTD.
DTDs Do Not Support Datatypes
One of the greatest strengths of XML is the fact that documents
are completely written with a single, common data type - text. When
we have our programming hats on, however, we often need to talk about
types other than text. DTDs offer few datatypes other than text,
which is a serious shortcoming when using XML in certain kinds of
applications.
Because DTDs provide no standard mechanism for including the
non-textual type of the data we markup, this means we have to share
information about data types implicitly, performing the conversion
for ourselves as we parse documents. For example, if we wanted to
perform a calculation on some numeric element content, we would have
to transfer the text into the appropriate datatype before the
application could be expected to work with the data.
DTDs Do Not Support Inheritance
With DTDs there is no way of expressing inheritance, so if you
imagine that we have a class called books, there is no way that we
can say that books is a subclass of, say, publications, and have
books inherit from publications.
In addition if we divide our books up into three types:
Professional level, Programmer's Reference, and Beginners guides, we
cannot say that they are sub-classes of books, and get them to
inherit the properties of the books class.
In summary, DTDs are fine for defining document structures, and it
is easy to understand the choice of DTDs in the XML 1.0 specification
when we consider that XML was born out of SGML, which also uses DTDs.
However, as we see XML being used in more programmatic situations,
rather than just document markup, these limitations become
increasingly important.
These, then, are the principal objections that schemas seek to
address. Before looking at the current state of the XML Schemas
draft, we should review some of the other metadata efforts in the XML
community so that we can appreciate the direction in which they are
going.
The academic world wasn't sitting around waiting for the invention
of XML before taking on the topic of metadata. Metadata - data about
data - is about describing information. This may be as simple as
establishing a database schema or as ambitious as discussing the
meaning behind the definitions in such a schema.
The academic community - and some of the XML-related metadata
proposals - tends toward the ambitious end of this spectrum. One
example is Resource Description Framework (RDF), a W3C backed effort
for describing resources so that they may be discovered
automatically. Other proposals have been aimed more at replacing DTDs
or representing data in the manner of relational database
schemas.
Because of the desire for an XML-based schema language to replace
and extend DTDs, a number of proposals were put forward. These
include:
XML-Data
Document Content Description (DCD)
Schema for Object-Oriented XML (SOX)
Document Definition Markup Language (DDML previously known as
XSchema)
None of these have directly received formal work backed by the
W3C, however each has been considered in the W3C work on XML
Schemas.
Our needs fall somewhere in the middle of RDF and a simple XML
version of DTDs. We need a way to express structure and content in a
simple yet expressive form. While we would certainly appreciate as
much expressive power as we might be offered, we are mindful of the
fact that simplicity is also a strong factor in getting a proposal
implemented in software and accepted by the community. XML itself,
after all, is a simplified version of SGML. By reducing the feature
set to a core of powerful yet simple features, XML's authors created
a simple standard that quickly won wide acceptance.
So, in this section about XML Schemas we will look at some
XML-based metadata proposals. First we will look at the ambitious RDF
effort, and then two of the other schema proposals, namely XML-Data
and DCD. This will give us the background to the work on schemas from
the W3C. While looking at these, we will point out some of the major
themes in XML-based schemas. The W3C schema group has looked at each
of these, and they are intriguing in their range, as a basis for
their work upon which the XML Schemas effort builds, drawing
inspiration and useful concepts into the latest generation of
metadata definition for XML.
After looking at these areas, we will see how the W3C work in
progress on XML Schemas is shaping up, and will end the chapter with
a look at using the early namespaces and schema support in MSXML.
The three proposals we review in this chapter are by no means the
only influences on the current W3C XML Schema effort, nor the only
metadata efforts progressing in the XML community. You are encouraged
to review the efforts on http://www.w3.org/Metadata/ and
http://www.w3.org/TR/. Some other efforts outside the W3C are
referenced on Robin Cover's XML site whose index is found at
http://www.oasis-open.org/cover/siteIndex.html. The three proposals I
cover in the limited space below are in the main stream of the XML
Schema effort and are sufficient to suggest some of the contributions
to XML Schemas. Others of note include Schema for Object Oriented XML
(SOX) and Document Definition Markup Lanuage, (DDML, previously known
as XSchema).
Note that we are not trying to teach each of these proposals,
rather we are introducing some of the key concepts that are addressed
in some of these metadata proposals. As the W3C XML Schema effort has
not yet been fully ratified, there are no applications that support
it yet for the purpose of examples. However, we will look at a
specific syntax that is implemented as a technology preview by
Microsoft in their MSXML parser (which ships with IE5 and is
available as a standalone component). MSXML uses a subset of the XML
Data proposal called XML Data - Reduced. These examples will come
nearer the end of the chapter. So let's get on and look at the first
of the proposals we will be introducing.
Resource Description Framework
The Resource Description Framework (RDF) is at the more ambitious
end of the spectrum in the metadata efforts. It allows a designer to
describe objects, add properties to define and describe them, and
also to make complicated statements about the objects, such as
statements about relationships between resources. Its proposed uses
include sitemaps, content ratings, stream channel definitions, search
engine data collection (web crawling), digital library collections,
and distributed authoring. The specifications come in two
sections:
Model and Syntax
RDF Schemas
The basic RDF model is a full Recommendation (22nd
February 1999). It covers the descriptive data model that can be
expressed in XML, as well as other syntaxes. RDF Schemas are a
Proposed Recommendation (3rd March 1999) covering an XML
vocabulary for expressing RDF data models. RDF draws on the
experience of developing the Platform for Internet Content Selection
(PICS), a scheme for defining Web content and implementing rating
systems, and also draws on earlier academic work in metadata.
Schemas developed with RDF can define not only names and
structure, but can also make assertions such as relationships about
the things under discussion. RDF can be complicated, but it offers
such tremendous expressive power and depth, that its complexity is
required for it to be so descriptive.
RDF is oriented around three concepts: resources, properties, and
statements.
Resources
Resources can be almost anything - any tangible entity in a
conceptual domain that can be referred to by a URI, from an entire
web site to a single element in an HTML or XML page. It could even
include something that is not available on the web, such as a printed
book.
Resources are typed; a class system is used to define categories
from which specific resource instances are drawn. Class inheritance
is supported, so a designer can specify levels of definition ranging
from highly general to narrowly specific. Here are two simple class
definitions, the first defines a general Rocket class, and the second
refines that class through inheritance into a ChemicalRocket class.
The rdfs and rdf namespaces are part of the RDF Recommendation:
<rdfs:Class rdf:ID="Rocket">
<rdfs:subClassOf
rdf:resource="http://www.w3.org/TR/WD-rdf-schema#Resource"/>
</rdfs:Class>
<rdfs:Class rdf:ID="ChemicalRocket">
rdfs:ClassOf rdf:resource="#Rocket" />
</rdfs:Class>
Properties
Resources are said to have properties that define and describe
them. Constraints are placed on properties to give them shape. These
constraints limit the types of values that can be assigned to a
property and the range of literal values from the type that can be
chosen. Let's give our chemical rocket some fuel:
<rdfs:Class rdf:ID="Fuels">
<rdfs:subClassOf
rdf:resource="http://www.w3.org/TR/
WD-rdf-schema#Resource"/>
</rdfs:Class>
<rdf:Property ID="fuel">
<rdfs:range rdf:resource="#Fuels" />
<rdfs:domain rdf:resource="#ChemicalRocket"
/>
</rdf:Property>
Our fuel property is typed as being of the Fuels class, and the
property can take on values from this range. To do this, we would
have to make a class declaration similar to the ones above somewhere
else in our schema to define this, perhaps providing literal values
for the rocket fuels we wish to discuss. The fuel property applies to
the ChemicalRocket class, its domain.
Statements
Once names and structure are defined through resources and
properties, statements about the conceptual domain can be made. This
is done by composing triplets of subject resources, property
predicates, and value objects. The values can be literals for
specific statements, or resources for powerful and sweeping
statements about entire classes. Let's make a simple statement about
a particular rocket in a document conforming to our RDF schema.
First, you begin by declaring an instance of the ChemicalRocket class
and giving it a name:
<?xml version="1.0" ?>
<rdf:RDF
xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<ChemicalRocket ID="Moonship"
xmlns="urn:my-rdf-rocket-schema"/>
<rdf:Description about="Moonship">
<fuel>hydrogen</fuel>
</rdf:Description>
</rdf:RDF>
Once we've declared our rocket instance, Moonship, using the ID
attribute of the resource, we proceed to an RDF Description element.
I've provided a particular value, hydrogen, to the fuel property
(remember, Moonship is an instance of ChemicalRocket, and that class
uses the fuel property). This may seem like a lot of work to do
something very simple, but we can use this same syntax to make
statements about classes, as well. As you make more statements within
this schema, you will develop a rich body of explicit knowledge about
the problem domain.
RDF is powerful, permitting tremendously expressive and sweeping
statements. It answers the strong typing limitation of DTDs; indeed,
strong typing is central to an RDF schema. Unfortunately, designing
an RDF schema is a laborious process involving the declaration of
many classes and properties. The ability to make meaningful
statements, while appreciated, is probably a more powerful feature
than we need for the purposes of defining XML vocabularies.
This is not to say that they are not useful in other situations.
RDF statements let us formally describe facts in a machine-readable
format. Normally, the XML vocabularies we write rely implicitly on
commonsense understanding of the underlying real-world concepts. With
RDF statements we could, at least in theory, provide enough
information that an application could discover additional facts about
a vocabulary. This would enable it to make better use of a new
vocabulary and decide when it is applicable to a problem at hand. RDF
will let an application drill down to the basic facts of a domain, at
least to the point where we must engage in metaphysical discussions
of whether a machine can understand the way people do. In short,
then, RDF gives a tool for providing a description of the environment
surrounding a vocabulary, one which tools can use to place a
vocabulary in its proper context.
For a designer laboring over the task of defining names,
structure, and relationships, though, this might be one burden too
many. Our next metadata proposal takes a few steps down on the scale
of expressiveness and generality.
Further information regarding RDF may be found at
http://www.w3.org/TR/REC-rdf-syntax (basic model) and
http://www.w3.org/TR/PR-rdf-schema (RDF Schemas).
XML Data
XML Data aims for a more modest scope than RDF. This proposal was
submitted to the W3C by ArborText, DataChannel, Inso, and Microsoft,
and is clearly focused toward automated documents and processing, but
is still more ambitious than DTDs.
XML-Data makes a distinction between syntactic and conceptual
schemas. While both use the same language, they provide different
ways for us to think about the data we are marking up.
A syntactic model is a set of rules describing how to write
documents using markup, as such DTDs are an example of syntactic
schemas. In an XML document marked up according to our catalog DTD, a
<Book> element can legally contain <Title>,
<Abstract>, <RecSubjCategories>, and <Price>
elements. A syntactic XML Data schema of this would represent similar
constraints on the structure of the vocabulary.
Conceptual models, however, describe relationships between
concepts or objects and as such they are ideal for modeling
relational databases. We could use an XML Data schema to suggest the
relationships that books have titles and prices, in a manner separate
from the syntax of any XML document. In this sense XML Data was
intended to broaden XML's reach to readily encompass information from
relational databases. The principal relationships captured by the
keys in a relational database can be captured formally in an XML Data
schema. With namespaces, we can capture ad hoc relationships such as
those in an ad hoc join by declaring namespaces for the joined tables
and qualifying the columns in the query according to the table from
which they come. We discuss the use of schemas with databases in
Chapter 10.
XML Data provides some interesting tools that make it more
powerful than DTDs. These tools address several of the problems we
found in DTDs, so let's take a look at some of them and how they can
be used:
Written in XML
XML Data uses an XML vocabulary for the construction of schemas,
which allows users to read and write schemas without having to learn
a new syntax first. It also means that we can use the DOM and
existing parsers to peruse a schema or create a new one
dynamically.
Continuing the conceptual schema metaphor, we could dynamically
create a schema for an ad hoc SQL query based on the query
itself. The recipient of the data would have both data and
formal structure and would never know that this was dynamically
generated.
Data Typing
XML Data adds strong typing of elements and attributes, thereby
answering one of our prime objections to DTDs. These may be basic
types defined by the datatypes namespace, or complex, user-defined
types provided in a schema provided by the designer. There is no
longer a need for applications to implicitly understand the datatype
of some element or attribute and convert the strings of text into the
appropriate format before using the data. This information can
be explicitly specified in the schema, and parsers can perform the
conversion on behalf of the application.
Constraints on Allowed Values
XML Data allows constraints on the range of values for elements
and attributes to be defined, such as minimum and maximum. This can
be extremely helpful in a lot of situations where you are validating
XML documents. If you imagine an ordering scenario where you only
accepted minimum orders worth over one hundred dollars, but to a
maximum value of one thousand dollars, you could impose these
constraints in an order schema written in XML Data. In an alternative
sense you could use constraints to prevent people from spending any
money if their account has no funds, or prevent them from inputting
values that were not valid.
Inheritance of Types
An interesting reuse mechanism is XML Data's support for
inheritance of types. This lets us evolve and extend elements as we
describe the entities in the problem we are trying to solve with XML.
We can write some generally expressive supertypes, then refine them
into more specific classes of elements by adding members to or
replacing members of the supertype declaration. Entities may be used
this way in DTDs, but type inheritance formalizes the process.
Without a formal set of semantics, entities can be misused to the
point where they confuse rather than enlighten the user. A formal
inheritance mechanism gives us a tool for promoting reuse while
keeping some control of how the tool is used.
Open and Closed Content Models
Another powerful feature of XML Data is the notion of open and
closed content models. A classical DTD is a closed model. Documents
conforming to it must adhere to the rules and may not include
anything that does not follow the rules, because all rules in a
vocabulary must exist in the DTD.
If a schema is open, documents conforming to it may include other
information not declared in the DTD. The parts that conform to the
schema must obey the rules laid down in the schema, but we can insert
other items without restriction from the current schema. These items
may be defined in another schema or may be completely unconstrained.
We might insert ad hoc values. More importantly from the standpoint
of our current discussion, open model documents are the way we can
mix namespaces. We can embed a chunk of information conforming to one
schema right in the middle of a document conforming to another. More
formally, individual elements may be explicitly declared to have open
or closed content models. This is done through the content attribute.
The default value for this attribute is open. Here is an example:
<elementType id="Person" content="closed">
<element type="#name"/>
<element type="#address"/>
</elementType>
<!-- This document fragment is invalid due to the added
Telephone element -->
<Person>
<Name>John Doe</Name>
<Address>123 Anywhere Street Blasted Rock,
NV</Address>
<Telephone>555-1212</Telephone>
</Person>
Had the content attribute in the above example been given the
value open, the fragment would have been valid.
Expanded ID and IDREF constructs
XML Data extends ID and IDREF constructs with relations. In a
relation, one element acts as a key or index into another element's
content. This is directly applicable to the primary and foreign keys
of relational databases. It is also particularly useful in bilingual
documents. Two are of particular interest: aliases and
correlatives.
An alias is used to define an equivalent element, so in our
example we may have <Book> in the English document and want to
translate the tags to <Livre> in the equivalent French
elements.
Other times we will want to suggest that two tags describe
identical things, this is done using a correlative.
To think about this in a different way we may have a shopping
document, in which we have a <Purchaser> element, which refers
to a <Customer> element elsewhere. The correlative for
<Purchaser> is <Customer>, indicating that
<Purchaser> is an alias for <Customer>. This will be
familiar to database designers from their work with entity
relationship diagrams.
As you can see, XML Data directly answers all our objections to
DTDs. We will not go further with practical information on XML Data
quite yet, as a reduced form of the proposal appears in the schema
support provided by the XML parser that comes with Microsoft Internet
Explorer 5.0. We will study that support in depth later in this
chapter.
More information on XML Data may be found at
http://www.w3.org/TR/1998/NOTE-XML-data/.
Document Content Description
The Document Content Description (DCD) proposal followed on the
heels of the XML Data proposal. It was submitted by IBM, Microsoft,
and Textuality. It is an RDF vocabulary expressly designed for the
purpose of declaring XML vocabularies. Its backers used the
expressive power of one metadata standard - RDF - to create a
proposed standard with more modest scope. This is in the same spirit
as XML's creation as a simplified subset of SGML.
DCD is syntactically similar to XML Data, although some of the
more advanced features of XML Data are gone. DCD has no mention of
relations and correlatives. It is strictly focused on defining XML
vocabularies. It does, however, retain the strong data type support
of XML Data, as well as element inheritance. Like XML Data, DCD
permits a vocabulary designer to declare a schema model either open
or closed. Unlike XML Data, DCD uses the same mechanism for declaring
schemas open or closed as it uses for element definitions. Like XML
Data, DCD permits the specification of constraints on the value of
element content. For example, an element named
<SmallInvestment> may be declared to be a fixed numeric type
with constraints on its permissible values, say greater than zero and
less than or equal to ten thousand.
<ElementDef Type="SmallInvestment" Datatype="fixed14.4"
MinExclusive="0.00" Max="10000.00">
DCD, while drawing from the rich body of RDF, is a direct assault
on the problems of DTDs. It exchanges broad power for focused
simplicity. Since it is so similar to both XML Data and the schema
support in Internet Explorer, we will not go into greater depth on
DCD. For the purposes of understanding the W3C schema efforts,
however, remember that DCD is the simple end of the metadata
spectrum. It focuses with sharp precision on the immediate problems
of DTDs and forgoes depth in order to provide a readily implemented
standard for XML schemas.
The W3C Note concerning the Document Content Description proposal
may be found at http://www.w3c.org/TR/NOTE-dcd/.
Finding the Right Balance
These proposals represent a selection of the spectrum of metadata
capabilities. They are by no means the only efforts that have had an
influence on XML Schemas.
Consider them, though, in the context of this book. Ask yourself
"What is really needed to facilitate the use of XML in networked
applications?" Answers to our earlier objections are a minimum set of
requirements. In fact, for intranet applications, we might even get
along without the ability to read schemas with XML parsers. I would
argue for another requirement: simplicity. Application integration,
particularly over the public Internet, cries out for simple, reliable
solutions. Complexity is an invitation to failure, and delayed
delivery. Just as simple XML rapidly outstripped complex SGML in
popularity and rate of adoption, I believe a simple yet effective
metadata proposal will best answer our needs.
RDF is admirable in its scope. It will likely find use in
specialized arenas that require its powerful range of expression. It
is unreasonable to expect, however, that a standard this complicated
is going to become an integral part of the Web application
developer's tool kit anytime soon. XML Data and DCD are closer to the
mark; they have stripped out complexity in favor of what their
promoters perceive to be the essentials. This is a difficult line to
draw. Are the relations of XML Data necessary or not? Much depends on
the nature of XML-based applications in the next few years.
We need something sooner than that. Metadata activity at the W3C
has gathered momentum, perhaps in response to the many competing
contributions from all sources. A working group devoted to XML
Schemas has been hard at work and is hoping to reach Recommendation
status during 2000. XML Schemas owe much to RDF, XML Data, DCD, and
several other proposals. The current effort seems to be gravitating
toward the simple end of the spectrum, which bodes well for timely
completion of the initial effort (although it may well be extended at
a later date). As this hoped to be an approved Recommendation of the
W3C soon after the release of this book, we will examine this draft
in depth.
The W3C XML Schema Working Group has a two part working draft for
XML Schemas dated 17 December 1999. As with any working draft,
particular features and syntax are subject to change in later
versions. These schemas answer our main objections to DTDs that we
talked about earlier in the chapter. They are written in XML syntax,
they permit the use of multiple namespaces, and they provide for
strong typing of content. They are, moreover, a superset of the
capabilities of XML 1.0 DTDs. Their expressive power is greater than
DCD, but it is far less abstract than RDF. In short, this is a
promising metadata effort.
The Working Draft of 17 December 1999 is divided into two
sections: structures and datatypes.
The structures section, XML Schema Part 1: Structures, deals with
the description and declaration of elements and attributes. The
material provided therein allows an XML designer to specify complex
element structure and set constraints on the permitted values of the
content of those elements. This part of the specification can be
found at http://www.w3.org/TR/xmlschema-1/
The second part, XML Schema Part 2: Datatypes, sets forth a
standard set of content datatypes as well as the rules for generating
new types from them. This part of the specification can be found at
http://www.w3.org/TR/xmlschema-2/.
Hopefully, you are by now eager to learn about the formal syntax
of XML Schemas. Just to make sure that this is so, let me
provide a very simple DTD and its translation into XML Schema
form. For all that I've talked about schemas and their
features, I haven't let you see an example. Seeing the contrast
between current practice - DTDs - and what we hope will become future
practice - schemas - will show you how dramatically things will
change. It may also give you some insight into some of the
things we have been talking about so far. Don't worry too much
about the syntax of the schema. We will explore that at length
in the sections to come. Try to take in the big picture and use
it as a frame of reference going forward.
Consider the following DTD for naming a person:
<!ELEMENT
Name (Honorific?, First,
MI?, Last, Suffix?)>
<!ELEMENT Honorific (#PCDATA)>
<!ELEMENT First
(#PCDATA)>
<!ELEMENT
MI
(#PCDATA)>
<!ELEMENT
Last (#PCDATA)>
<!ELEMENT Suffix
(#PCDATA)>
We must minimally have first and last names, but we may optionally
have a middle initial, honorific (Mr., Ms., Dr., etc.) and a suffix
(Jr., III, etc.). Here is what it looks like in a schema:
<Schema ...>
<element name="Name">
<type>
<element
name="Honorific"
type="string" minOccurs="0" maxOccurs="1"/>
<element
name="First" type="string"/>
<element
name="MI"
type="string" minOccurs="0" maxOccurs="1"/>
<element
name="last" type="string"/>
<element
name="suffix"
type="string" minOccurs="0" maxOccurs="1"/>
</type>
</element>
</Schema>
The schema form is somewhat longer, but you will notice we specify
a bit more information. To start with, we have a <Schema>
element as the root of the schema. Then we have an element called
Name, the name of which is set in the name attribute of the
<element> tag, so:
declares a <name> element. What is that for? I've used it in
its simplest form here, but you should know it can be given a name
and enclose element declarations. In such a form, it is suitable for
reuse elsewhere, and specifies the content model of the <Name>
element. Note how the elements contained within <Name> are
declared. Since they are simple types (such as strings or PCDATA), we
can declare them within the body of the <Name> declaration
without further elaboration. You'll see that XML Schemas provide a
longer list of basic types than we have with DTDs today.
Note how the optional elements are specified. With schemas we can
specify the minimum and maximum number of times an element appears.
This can lead to content models of greater complexity than we can
specify in a DTD.
Above all, though, note the obvious - the schema is XML. The DOM
manipulations you learned in previous chapters can be used to walk
through this schema in a program and take it apart. This cannot be
said for the DTD form.
Structures
Everything we can define with a DTD is accounted for in the
Structures portion of XML Schemas. As XML Schemas are written in XML
syntax, structures refer to the XML constructs that we can use to
define our markup. Of course, this means that XML Schemas are really
just another application of XML (an XML vocabulary for defining
classes of XML document), and as such could have a schema to describe
itself (in fact both a Schema and a DTD are provided in the
appendices for the Structures section of XML Schemas to describe the
schema vocabulary).
So the structures section of the specification is the part where
the elements and attributes for defining schemas are set out. More
importantly, the content model for elements is described in this
part. Content models explicitly specify the allowable internal
structure of elements. Structures are the heart of XML Schemas. So,
let's consider these in detail.
Writing Schemas
A schema consists of a preamble and zero or more definitions and
declarations. The next few sections discuss these definitions, so
let's start with the preamble.
Preamble
The preamble is found within the root element, schema.This must
include at least three pieces of information in attributes:
targetNS, which is the namespace and URI of the schema you are
using
version to specify the version of this schema
xmlns which provides the namespace for the XML Schemas
specification
optionally, finalDefaultand/or exactDefault, to provide defaults
for two types of extension that we shall take up much later
It may also include export, import, and include constructs, which
we shall discuss later. Here is a sample schema showing the
preamble:
<?xml version="1.0"?>
<schema targetNS="http://myserver/myschema.xsd"
version="1.0"
xmlns="http://www.w3.org/1999/XMLSchema">
...
</schema>
Here, our hypothetical schema is residing on myserver, and is
called myschema.xsd, .xsd being the file extension for XML Schemas.
It is in its first version. The default namespace declaration is the
schema reference to XML Schemas: Structures, and this is a
closed model schema, which means that all documents conforming to
this schema will be completely defined by the schema and must not
have any outside content.
Simple Type Definitions
The structures defined for XML Schemas rely heavily on type
definitions. These allow a schema designer to declare extended types
that can be used throughout a schema. They will be used to specify
the content and type of elements and attributes. Let's start simply,
though. A simple type definition is used to constrain information
that does not include elements. It consists of a name and a
specification that is either a reference to another type definition
or consists of a series of facets. Facets will be described fully in
the datatypes section, later in this chapter. A free-standing simple
type definition is found in a datatype element:
<datatype name="smallInt" source="integer"/>
<minExclusive value="0"/>
<maxExclusive value="10"/>
</datatype>
We'll discuss this construction at length under datatypes. We can
also have a simple type definition within other declarations, such as
attributes. This is done with the type attribute, type="smallInt",
for example, which tells us the type of the declared item.
Complex Type Definitions
These are essential constructions in XML schemas. Without them, we
would be unable to compose nontrivial content models for elements.
The <type> element encloses a complex type definition. Nested
within it, we have declarations for elements and attributes, or
references to model groups. For example:
<type name="someContent">
<element .../>
<attribute .../>
</type>
Complex Type Definitions may become much more involved. This will
be difficult to understand until we have learned how to declare
attributes and elements. Pay attention to the <type> elements
you see as we move forward and you will see what I mean.
Attributes and Attribute Groups
Attribute declarations consist of an <attribute> element,
which must minimally include a name attribute. The <attribute>
element also has optional cardinality attributes, minOccurs and
maxOccurs, which are used to indicate whether the attribute must
appear, and if so, how often. A type attribute specifies the datatype
of the attribute, such as string or integer. An attribute declaration
may also have default and fixed attributes. These function much like
the IMPLIED and FIXED keywords in DTDs. The value of the fixed
attribute is the value the attribute must always have. The value of
the default attribute is the value which is assumed if the attribute
does not explicitly appear in an element within an XML document. Here
are a couple of sample attribute declarations:
<attribute name="simpleAttr"/>
<attribute name="sequenceNo" type="integer"
default="0"/>
We will often encounter a group of related attributes that are
applied to multiple element declarations in a schema. XML schema
structures accommodate this with the idea of attribute groups. This
is a named collection of attribute declarations:
<attributeGroup name="troopParameters">
<attribute name="serialNum"
type="string"/>
<attribute name="rank" type="string"/>
</attributeGroup>
<type name="officerParms">
<attributeGroup ref="troopParameters"/>
</type>
Here we've declared the troopParameters attribute group, then used
it within the officerParms type definition.
Content Models
We won't get far without content models, and XML Schemas provide
us with mechanisms for describing content models with a lot more
accuracy than DTDs. These use complex type definitions and a new
structure, the <group> element, to build the internal contents
of an element declaration.
We now need another attribute for type elements, the content
attribute. The content attribute tells us what elements can be
contained (although it says nothing about permitted attributes):
|
Content attribute value
|
meaning
|
|
unconstrained
|
Content of any kind
|
|
empty
|
Empty element
|
|
mixed
|
Elements and character data
|
For example:
<type name="WideOpen" content="unconstrained"/>
<type name="NothingHere" content="empty"/>
<type content="mixed">
<element ... />
</type>
Things become more interesting when we get to element-only
content. Now we need some content operators - termed compositors in
the Schema draft - to show how content may be composed. These
compositors are the value of the order attribute of a <group>
element. This new element gives us a way to provide ordered bodies of
elements in a declaration. The compositors are shown in the following
table:
|
Compositor keyword
|
Meaning
|
DTD equivalent
|
|
seq
|
Elements must follow in exact order
|
, (comma)
|
|
choice
|
Exactly one of the model elements appears
|
| (pipe)
|
Element Declarations
Here, we can immediately see how XML is used to make the syntax of
schemas an XML application, where we had to use the <!ELEMENT
syntax to declare a <Book> element in a DTD we now put element
declarations inside an XML element, so we use:
Here the <element /> element is used to declare an element
(the element is describing its content in keeping with the idea of
self-describing data). The name attribute simply takes a value of the
element we are creating.
Simple elements are composed of a reference to a data type and a
series of attribute declarations or a reference to an attribute
group. This is analogous to a DTD declaration where the element
contains only PCDATA, except that the content is strongly typed. For
example:
<element name="ZIP" type="string"/>
<element name="windspeed" type="float"/>
These would correspond to:
<!ELEMENT ZIP #PCDATA>
<!ELEMENT windspeed #PCDATA>
Of course, there would be no notion of the string and
floating-point numeric types from the DTD declarations. When we want
to define an element with structure, we replace the data type
reference with a content model. Let's leave that aside for a moment
and see how we make an element declaration by adding references to
other declarations. Let's specify the schema for this simple fragment
of XML:
<Name>
<First>John</First>
<MI>A.</MI>
<Last>Doe</Last>
</Name>
Here are the required element declarations:
<element name="First" type="string"/>
<element name="MI" type="string"/>
<element name="Last" type="string"/>
<element name="Name">
<type>
<group order="seq">
<element
type="First" type="string" minOccurs="1"/>
<element
type="MI" type="string" minOccurs="0"/>
<element
type="Last" type="string" minOccurs="1"/>
</group>
</type>
</element>
This starts out simply enough. First, MI, and Last are strings.
Note that I've made MI a string to accommodate long middle initials,
such as O'M or A. G. Now we'll wrap them together into the composite
element <Name>.
Examples are often the best way to learn, so here are some more
examples and their DTD equivalents:
<element name="ListOfNames">
<type>
<group order="seq">
<element
type="CustomerName"/>
<element
type="SalesName"/>
<element
type="ProductName"/>
</group>
</type>
</element>
<!ELEMENT (CustomerName, SalesName,
ProductName)>
<element name="PickOne">
<type order="choice">
<group order="choice">
<element
type="ColumnOne"/>
<element
type="ColumnTwo"/>
</group>
</type>
</element>
<!ELEMENT PickOne (ColumnOne | ColumnTwo)>
Now, we'll want to be able to specify multiple occurrences of
element content. To do this, we use the minOccurs and maxOccurs
attributes on the element references. When we get to model groups in
a little while, we'll see that we can apply these attributes there as
well to build more complicated content models.
Model Groups
Some other schema constructs give us the ability to compose
building blocks of definitions and declarations. As we have seen, we
can have a model group within a particular type, to which we can then
give a name. This construct enables us to build complex content
models as we can refer to a named model group to build some part of
an element content model for reuse in types and element declarations
by putting a name to a model group, thereby allowing us to reference
it elsewhere. Here are some samples:
<type minOccurs="1" maxOccurs="2">
<group order="seq">
<element type="A"/>
<element type="B"/>
</group>
<group order="choice" minOccurs="3"
maxOccurs="7">
<element type="C"/>
<element type="D"/>
</group>
</type>
In this model, every document will start with a sequence of AB.
This will occur at least once, perhaps twice. Next, we can choose
between C and D and make the choice three to seven times. Finally, we
bring all our elements back one last time in any order. The following
would be a legal document fragment conforming to this content
model.
<A/><B/><A/><B/> <!--
sequence -->
<C/><C/><D/><C/> <!--
choice -->
You can also nest groups to form complex content models. For
example:
<group order="seq">
<group order="choice">
<element type="A"/>
<element type="B"/>
</group>
<group order="choice">
<group order="choice">
<element
type="A"/>
<element
type="B"/>
</group>
<group order="seq">
<element
type="B"/>
<element
type="C"/>
<element
type="D"/>
</group>
</group>
</group>
The equivalent DTD content model for some element <foo>
is:
<!ELEMENT foo ((A | B), ((A | B) | (B, C, D)))>
Now consider how we can use content model groups if we can refer
to them by name:
<group name="partsGroup" order="seq">
<element type="BigParts"/>
<element type="LittleParts"/>
</group>
<element name="PartsAndTheirMeasures">
<type>
<group
ref="partsGroup"/>
<attribute name="count"
type="integer"/>
<attribute name="size"
type="integer"/>
</type>
</element>
In the preceding example, I defined a content model, then
incorporated it into an element declaration. The combination of these
constructs gives schema designers flexible reuse and permits the
specification of vocabularies with great economy.
<attributeGroup name="partMeasures">
<attribute name="count" type="integer"/>
<attribute name="size" type="integer"/>
</attributeGroup>
<element name="PartsAndTheirMeasures">
<type>
<group
ref="partsGroup"/>
<attributeGroup
ref="partMeasures"/>
</type>
</element>
This is a variation on the first example. Instead of building the
attribute declarations into the <element>, I created an
attribute group containing the declarations, then created the element
declaration using references to the element group and the attribute
group. Here's another way to use attribute groups.
<element name="PairedFasteners">
<type>
<group order="seq">
<element
type="Nut"/>
<element
type="Bolt"/>
</group>
</type>
<attributeGroup ref="partMeasures"/>
</element>
This time, I wanted to reuse the attribute group with different
element content. I was able to do this in an element declaration by
explicitly specifying the content model, then using a reference to
the attribute group. Note that my content model includes elements of
the types Nut and Bolt. These are types I would have had to declare
elsewhere in the schema.
Wildcards
XML schemas provide the any element that allows us to introduce a
wildcard into a schema at any particular point. Schemas provide for
departures from the written schema in any of the following four
ways:
Any well-formed XML element construction
Any well-formed element construction so long as it is in any
namespace other than the one in which the wildcard appears
Any well-formed element construction, provided it is from a
specific namespace
Any well-formed element construction provided it is from the
current namespace
Wildcards may also be used in conjunction with attributes, in
which case we can use the anyAttribute element. Here are examples for
each of the four cases:
<any/>
<any namespace="##other"/>
<any namespace=http://www.myserver.com/OtherSchema/>
<any namespace="##targetNamespace"/>
Note the use of the other and targetNamespace keywords. Now,
here's an example of using a wildcard in conjunction with attributes
within an element declaration:
<element name="someElement">
<type>
<anyAttribute
namespace=http://www.w3.org/1999?XMLSchema/>
<element name="someNum"
type="integer"/>
</type>
</element>
Here we've declared an element that has a single child element,
<someNum>, and may have any attribute declared in the W3C
schema for XML Schemas.
Deriving Type Definitions
When we use the source attribute on a type, we are in effect
deriving a new type from an existing one. XML schemas provides some
formal rules for type derivation that we will now examine.
Specifically, we can extend a type or restrict it. The value of the
derivedBy attribute specifies which method is used.
Derivation
A new type extends another when it adds additional content to its
source type. In this case, all the content declared in the source
type will appear in the derived type. For example, we extend a
PersonName type declaration by adding an honorific element to the
existing content:
<type name="PersonName">
<element name="FirstName"
type="string"/>
<element name="MI"
type="string"/>
<element name="LastName"
type="string"/>
</type>
<type name="FormalPersonName"
source="PersonName" derivedBy="extension">
<element name="honorific"
type="string"/>
</type>
If, however, we wish to somehow restrict a type when we derive a
new type from it, we can give the derivedBy attribute the value
restriction and add a <restrictions> element:
<type name="ShortName" source="PersonName"
derivedBy="restriction">
<restrictions>
<element
name="MI" maxOccurs="0"/>
</restrictions>
</type>
Here, we've restricted the type so that the <MI> element no
longer appears. When deriving types, be sure the constraints on
elements and attributes are more restrictive than those on the same
declaration in the source type.
Types may control derivation from themselves as well as their
appearance in instance documents through the use of three attributes,
abstract, exact, and final. If abstract has the value true, no
instance of the declared type may appear in an instance document. The
default for this implied attribute is false as one might expect. If
exact has the value true, no derived type may appear in an instance
document in its place. Only the type so declared may be used. If
final is given the value true, then no further derivation of the type
is permitted.
Composition
We can combine schemas and namespaces together to allow users to
build document instances from multiple schemas. Schemas also allow
designers to use other schemas in building their own schema document.
This is termed composition.
Import
You can import parts of another schema for use in yours provided
the namespace of the other schema is referenced in an <import>
element. This element has the namespace attribute whose value is a
URI for the schema you want to use. You may also provide a
schemaLocation attribute to point to the schema file desired. Once
you have imported a namespace, you can use some construction from the
other schema within your schema:
<schema name="SomeOtherSchema.xsd"
xmlns:other="
http://www.OtherOrg.org/schemas/Useful.xsd" >
<import
namespace="http://www.OtherOrg.org/SomeUsefulSchema"
schemaLocation="http://www.OtherOrg.org/schemas/Useful.xsd"/>
...
<element ref="other:stuff" name="someName"/>
</schema>
When a construct is imported into a schema, it remains an external
resource. We are composing a new schema, in effect, by linking in
parts of another schema rather than including them whole in the new
schema. When a validating parser validates a document according to a
schema, it must retrieve the other schema to validate material in the
document against the external resource.
Inclusion
Inclusion is specified with the <include> element. This
appears in a schema after the <import> element and before the
<export> element, if any. The <include> element is an
empty element with the required attribute schemaLocation, whose value
is a URI to the included schema. When this element appears in a
schema, the schema is understood to consist of its declared types as
well as all the types declared in the included schema provided
several criteria are met: The URI must resolve to another schema, and
the schema thus designated must have a targetNamespace attribute
identical to the containing schema's targetNamespace attribute
value.
Annotating Schemas
No body of computing definitions or code is complete without a
mechanism for providing additional comments or processing
information. Schemas provide for this with the <annotation>
element. This element may contain <info> elements, which
consist of character data intended for human consumption, or
<appinfo> elements, which do the same for schema processors.
Either element may have an infoSource attribute that provides a URI
reference to further information.
<element name="HardToRemember">
<annotation>
<info>
I want to
remember the following about this element declaration...
</info>
</annotation>
...
</element>
The real world relies on concepts of numbers, strings, and sets,
so programs written in modern programming languages support elaborate
systems of built-in types and procedures for defining new types.
Therefore the addition of data types to XML Schemas will be a great
asset to programmers using XML for data in their applications. This
support for data types includes the ability to check the validity of
a value in a document as well as aiding an appropriate conversion
from text to the native type when processing an XML document. So, we
need to capture the data types of the information we markup if we are
going to use XML documents as the basis for integrating programs and
systems.
This is what the second part of the XML Schemas specifications,
XML Schemas: Datatypes, aims to do.
Not only does it provide a means of capturing the basic type of data,
but it also gives us a means of recording the constraints imposed on
the data in our problem domain. It will let us record numeric bounds,
sets and list ordering. It will also let us specify masks for the
permissible string representations of our data.
Schema datatypes are said to have a set of distinct values called
their value space. This is the abstract collection of values the type
can take on. For example, the set of integral numerics is the value
space for the integer type. Constraining properties and operations on
the values in the space characterize this space. When we go to
represent a data type for our users, we require a lexical
representation, the literal string representation of the type. A real
number might be represented a string of digits, a decimal point, and
a specified number of digits after that point. A date is represented
by YYYY-MM-DD. This is the ISO 8601 format, which XML adopts for
datetime representations.
XML Schemas: Datatypes is all about specifying value
spaces, then listing the constraining properties of the type. It
provides a set of primitive data types, and then elaborates a
mechanism for generating new types derived from those primitives. The
draft includes a number of generated types of wide utility, but
schema designers are welcome to generate their own types intended for
application-specific use.
Some properties, termed facets, are provided to specify datatypes.
Facets refine the value space to give us the permissible values for
the new type. Facets are either fundamental or constraining.
Fundamental facets define some fundamental property of the datatype.
Constraining facets place restrictions on the value space but do not
define its nature. Strings, for example, have length. Length doesn't
tell you about the nature of strings, but they define what string
values are permitted. Each type provided in XML Schemas lists its
specific facets. One very important facet is lexical representation.
Since we are speaking in terms of XML, a text-based system, we must
specify the text representation of non-text types. The particular
meaning of this facet depends on the datatype. The more important
ones are listed in the following tables.
Primitive Types
Primitive datatypes are those that are not defined in terms of
other types. They are axiomatic. We proceed from an intuitive concept
of the type described. It is natural for the XML Schemas proposal to
include the classic XML 1.0 types, but it also adds some types of its
own.
Here are the primitive types introduced by XML Schemas:
|
Schema Primitive Type
|
Definition
|
|
string
|
Finite sequence of ISO 10646 or Unicode characters, such as
"thisisastring".
|
|
boolean
|
The set {true, false}.
|
|
float
|
Standard mathematical concept of real numbers, corresponding
to a single precision 32 bit floating point type.
|
|
double
|
Standard mathematical concept of real numbers, corresponding
to a double precision 64 bit floating point type; doubles
consist of a decimal mantissa, followed optionally by the
letter E and an integer exponent, for example 6.02E23.
|
|
decimal
|
Standard mathematical concept of a real numeric type; it
covers a smaller range than double, and consists of a sequence
of digits separated by a period, such as 9.06.
|
|
timeInstant
|
The combination of date and time to define a specific
instant in time, encoded as a string, 2000-01-01T08:12:00.000
represents 8:12 on 1 Jan 2000, expressed with seconds and
fractional seconds. This type is always expressed
YYYY-MM-DDThh:mm:ss.sss, but can be immediately followed by a
Z, to specify that the time is a Coordinated Universal Time.
Alternatively, the time zone can be specified by supplying a
difference from CUT, using a + or a - followed by hh:mm. For
example, the above date and time string could be followed by
-04:00.
|
|
timeDuration
|
A combination of date and time to define a period, interval,
or duration of time. For example, one month is represented as
P0Y1M0DT0H0M0S, where the lexical pattern is PnYnMnDTnHnMnS,
and can be preceeded by a + or -. The representation may be
truncated on the right when the finer time intervals are not
needed, for example P2Y3M for 2 years and three months. Note
that the number precedes the character representing the
interval. Seconds may be expressed by a number including a
decimal to represent fractional seconds. A minus sign preceding
the lexical representation indicates a negative duration.
|
|
recurringInstant
|
An instant of time that recurs with some regular frequency,
such as, every day; represented by substituting a dash for any
period not provided in the lexical pattern for timeInstant. For
example, an instant that occurs at 08:00 every day would be
expressed as ----T08:00:00.000.
|
|
binary
|
Arbitrarily long bodies of binary data.
|
|
uri
|
URI reference.
|
Generated and User Defined Types
As the name suggests, a generated datatype builds from an existing
type. The type on which it builds is the basetype. XML Schemas
specify some generated types that are broadly useful. These are shown
in the following table:
|
Generated type
|
Base type
|
Meaning
|
|
language
|
string
|
Natural language identifiers; a token that meets the
LanguageID production in XML, for example "en"
|
|
NMTOKEN
|
NMTOKENS
|
XML 1.0 NMTOKEN
|
|
NMTOKENS
|
string
|
XML 1.0 NMTOKENS
|
|
Name
|
NMTOKEN
|
XML 1.0 name
|
|
Qname
|
Name
|
XML 1.0 qualified name
|
|
NCNAME
|
Name
|
XML 1.0 "non-colonized" name
|
|
ID
|
NCName
|
XML 1.0 attribute type ID
|
|
IDREF
|
IDREFS
|
XML 1.0 attribute type IDREF
|
|
IDREFS
|
string
|
XML 1.0 attribute type IDREFS
|
|
ENTITY
|
ENTITIES
|
XML 1.0 ENTITY
|
|
ENTITIES
|
string
|
XML 1.0 ENTITIES
|
|
NOTATION
|
NCName
|
XML 1.0 NOTATION
|
|
integer
|
decimal
|
Standard mathematical concept of a discrete numeric type
(discrete here separates it from the definition of number)
|
|
non-negative-integer
|
integer
|
Standard mathematical concept of non-negative integers
|
|
positive-integer
|
integer
|
Standard mathematical concept of positive integers
|
|
non-positive-integer
|
integer
|
Standard mathematical concept of a negative integer, or
zero
|
|
negative-integer
|
integer
|
Standard mathematical concept of a strictly negative
integer
|
|
date
|
recurringInstant
|
Standard concept of a day, that is, an interval beginning at
midnight and lasting 24 hours
|
|
time
|
recurringInstant
|
Same as the left-truncated representation for timeInstant,
hh:mm:ss.sss.
|
We declare a new type with a datatype element. This element has
name and source attributes. The source attribute's value indicates
the type from which the new type is derived. Here's a minimal
example:
<datatype name="height" source="decimal"/>
We further specify a new datatype by adding facets. These must be
appropriate to the basetype, that is, only ordered facets may be
applied to datatypes generated from an ordered basetype. Typically,
we would specify constraining facets for a new type by providing
specific values for the constraining facets of the basetype. For
example, let's declare some generated types denoting large and small
orders of products:
<datatype name="largeOrder" source="integer">
<minExclusive value="1000"/>
</datatype>
<datatype name="smallOrder" source="integer">
<minExclusive value="0"/>
<maxInclusive value="1000"/>
</datatype>
The integer type has constraining facets denoting bounds named
minInclusive, minExclusive, maxInclusive, and maxExclusive. The
example above takes advantage of these to establish that a small
order is anything that has between 1 and 1000 units, inclusive. A
large order in our type system is anything over 1000 units.
XML Schemas aren't yet a recommendation at the time of this
writing (Janurary 2000), so we cannot provide an example here of them
in use. However, to see how we will be able to utilize the power of
XML Schemas we can look at a different implementation of schemas
written in XML syntax called XML Data - Reduced, a subset of XML Data
implemented in Microsoft's MSXML parser, which we can use within IE5
or as a standalone component. While the syntax of XML Data - Reduced
does differ from the working draft of XML Schemas available at the
time of writing, it helps show how we can use the benefits that XML
Schemas bring in our applications.
Not only is MSXML one of the more widely used parsers, but
Microsoft is actively using XML Data - Reduced for a number of their
initiatives, notably BizTalk. This includes an effort to share
vertical market vocabularies for e-commerce. While Microsoft promises
to adopt XML Schemas when the draft becomes a Recommendation, the
result right now is that a lot of people are building prototypes and
even products using XML Data - Reduced, as an intermediary measure
until the W3C schema recommendation.
As this is an implementation we are able to work with now, and
which is being used in several areas for prototyping, in this
penultimate section of the chapter we shall take a look at the syntax
of XML Data - Reduced. Once we have looked at the syntax we will then
develop some examples that show you the power of these new
schemas.
IBM has introduced partial support for XML Schemas in a beta
edition of their XML4J parser. However, since MSXML has richer
support and is a shipping tool, we will focus on XML Data -
Reduced.
What is XML Data - Reduced?
As we have said, XML Data - Reduced (XML-DR) is a subset of the
full XML Data proposal, In terms of how much the subset covers, it
provides roughly the same functionality as the Document Content
Description specification containing those constructs needed to
perform the tasks of a DTD. It also provides a few extensions to the
capabilities DTDs offer. It is implemented as a technology preview in
the XML parser that ships with Internet Explorer 5.0. It is also
supported in some commercial tools, notably DTD/Schema editors such
as Extensibility's XML Authority. It is definitely investigating
because it is available for experimentation and is being used in a
number of initiatives.
Schema Support
Conceptually XML Data - Reduced is similar to the core constructs
of XML Schemas, even though the syntax is slightly different. The
more complicated constructs, such as types, are not reproduced, but
everything you need to define a vocabulary in XML is here, often
using very similar syntax. Here are the elements specified in XML
Data - Reduced and their XML Schemas equivalents:
Note carefully the case of the names as there are subtle
differences between XML Schemas and XML-DR schemas:
|
XML Schemas construct
|
XML-DR construct
|
|
schema
|
Schema
|
|
element
|
ElementType
|
|
elementRef
|
element
|
|
attribute
|
AttributeType
|
|
none
|
attribute
|
|
datatype
|
datatype
|
|
none
|
description
|
|
ModelGroup, group
|
group
|
The entire reference for XML-DR schemas may be found online at
http://msdn.microsoft.com/xml/reference/schema/start.asp.
Schemas
The Schema element in XML-DR is quite similar to the schema
element in XML Schemas. This element performs the following
functions:
Contains element and attribute declarations
Names the schema
Declares namespaces used in the schema
Unlike XML Schemas, schemas in XML-DR do not use a preamble
containing import, export, and include elements. Instead, they use
namespace declarations. Every XML-DR schema must declare the XML Data
and Microsoft datatypes namespaces. If a certain naming convention is
observed (which will be introduced below when we discuss parser
support for XML-DR), external content from another namespace may be
used and validated in a schema. Here is a sample schema omitting the
content:
<Schema name="ShortSchema.xml"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
... <!-- Declarations here -->
</Schema>
Elements and Attributes
Elements and attributes are declared in ElementType and
AttributeType elements, respectively.
<elementType name="myElement" />
The <ElementType> element has five important attributes.
|
ElementType Attribute
|
Meaning
|
|
name
|
Name of the element
|
|
content
|
Describes the content that may be contained by the element:
empty, textOnly (PCDATA only), eltOnly (element content only),
mixed (PCDATA and elements)
|
|
dt:type
|
Denotes the type of the element. This attribute corresponds
to the <datatype> element in XML Schemas. Valid values
are taken from the XML Data Types Preview implementation.
|
|
model
|
Open or closed content model
|
|
order
|
Basic ordering of child elements: one (one chosen from a
list of elements), seq (a specified sequence of elements), many
(specified elements may appear or not appear, in any order)
|
Again, elements can contain one of four types of content described
in the value of the content attribute of the <ElementType>
element:
no content: empty
text only: textOnly
subelements only: eltOnly
a mix of text and sub elements: mixed
We can use the <element> and <attribute> elements to
constrain the content of the declared element. These elements declare
the child elements and attributes that may be applied to an
element.
The <element> element can take three attributes:
|
Attribute
|
Description
|
|
type
|
Corresponds to the value of the name attribute of the
<ElementType> defined in the schema.
|
|
minOccurs
|
Minimum number of times the reference element type can occur
on the element, takes the values 0 where the minimum value is
zero, and the element is optional or 1 where the element must
occur at least once (default is 1)
|
|
maxOccurs
|
Maximum number of times the element can occur on the
element, takes the values 1 where it can occur once at the
most, or * where the occurrences are unlimited (default is
1)
|
The <attribute> element also can also take three
attributes:
|
Attribute
|
Description
|
|
default
|
Default value for the attribute, overrides any default
provided in <AttributeType> element it refers to.
|
|
type
|
Corresponds to the value of the name attribute of the
<AttributeType> element defined in this schema.
|
|
required
|
Indicates whether the attribute must be present on the
element, takes the value yes if it is required. Not needed if
specified in the <AttributeType> element.
|
Let's take a look at some simple element declarations and their
DTD equivalents. First we have a parent element called <Fex>,
which can contain a child element called <Tex>.
<ElementType name="Fex" content="mixed" order="many">
<element type="Tex"/>
</ElementType>
Here, some <ElementType> declaration for <Tex> would
have been included elsewhere in the schema. In a DTD, the declaration
we have just seen would be:
<!ELEMENT Fex (#PCDATA | Tex)*>
Next, we have a <Person> element, which has the child
elements <FirstName>, <MI> and <LastName>: