By now, I hope you are eager to apply namespace and schema
information to our book catalog example. Although we won't follow
this schema through the rest of the book, I'm going to show how we
can use what I've presented in this chapter to improve our
understanding and organization of book publishing information.
Why Bother?
What is wrong with our Book Catalog DTD? Nothing, really, but it
is starting to get long. Everything about book catalogs has to go
into the one DTD if we are to properly validate a catalog document.
All the criticisms leveled against DTDs earlier in the chapter in
general apply to our particular DTD.
The first this we will do is break up the publishing catalog
domain into two separate schemas, one reflecting a namespace dealing
with authors, the other catalog information. Additionally, we can
provide strong typing on some of our attributes and elements, thereby
making our life easier when it comes time to write applications that
process catalogs. Since XML Schemas is in a state of flux as it nears
Recommendation status, we will use the XML-DR version of schemas as
implemented in MSXML.
Segmentation
Our catalog.dtd that we met in Chapter 3 gave us a number of
concepts. A catalog certainly needs books, but wouldn't it be better
if the authors live outside the catalog? After all, if we need to
write a schema for marking up the actual content of individual books,
we will probably want to include author information there as well.
This is one of the main motivations in splitting up our Book Catalog
DTD into two separate schemas: Catalog and Author. When we want to
create a catalog document, we can declare a default namespace for the
Catalog schema and then use a qualified namespace to bring in the
Author schema.
Additional Expression
In our bookCatalog.dtd we had a few attributes that could benefit
from strong typing. If we included data types, it would make it
easier to total up page counts, and we would certainly like to be
able to be able to calculate order totals given an order from the
catalog. So, we need to go through the Catalog schema and see what
attributes should be qualified with type information.
Metadata Discovery
Creating a schema in XML syntax is useful to programmers, in that
we give them a little bit of support for writing programs that
manipulate book catalog documents marked up according to our schema.
The greatest support we provide is simply taking the DTD into schema
form. Once it is in XML syntax, programmers can use the same parser
that they use with XML document instances to discover the meaning
behind the metadata.
Suppose you were unfamiliar with our schema. You could investigate
individual elements with the <definition> element. This would
be useful in a document browser. A user might click for additional
information on a particular item and see the metadata associated with
it. We wouldn't present the XML definition, of course, though we
might show that an item is a numeric type. In the case of
enumerations, we would certainly show the range of values the item
could take. Cardinality information is certainly important to see, as
is the knowledge of whether an attribute is required or not. All
these could be discovered at the time a document instance is read as
long as we provide a schema in XML syntax to go with it. After we've
turned our catalog DTD into a schema I'll show how we can use the DOM
to generate a concordance of elements in a schema. This will take a
schema and provide you with a cross-reference of elements and how
they are used.
Recasting the DTD
Let's take a closer look at our DTD. We'll work through a
translation into XML-DR format showing incremental improvements to
our definition as we go.
There is no clear consensus on what file extension to use for
XML-DR schema files. Microsoft sources tend to use xml, while one
commercially available tool uses xdr. The W3C Schema working group,
as we have seen, favors xsd for their version of schemas. I will use
xml for the examples that follow. In any case, a schema is XML, so
its MIME type remains text/xml.
As you may remember, the catalog was split into three
sections:
Publisher information about the publisher
Threads that contain descriptive information
Books containing the information about the books
The publisher section also included the author details, however we
are going to remove the author details and place them in the separate
author schema so that we can borrow from it in the catalog schema,
and make use of it in other areas as well. So let's start with the
author schema, before coming back to the rest of the catalog.
Author Schema
We ought look at the authors schema first, because the catalog
schema we develop next will borrow from it. The first thing to do is
cut out the declaration of the <Author> element and everything
subordinate to it and create a new schema file, authors.xml. The top
of the file should declare conformance to XML 1.0, name the schema,
and declare the XML-DR and datatypes namespaces:
<?xml version ="1.0"?>
<Schema name = "authors.xml"
xmlns =
"urn:schemas-microsoft-com:xml-data"
xmlns:dt =
"urn:schemas-microsoft-com:datatypes">
Note that the default namespace is XML-DR and the datatypes
namespace will be aliased with the prefix dt. The Author element is
our starting place. It contains only element content consisting of
the name-related, <Biographical>, and <Portrait> elements
in sequence:
<!ELEMENT Author ((FirstName, MI?, LastName,
Biographical, Portrait)>
<!ATTLIST Author authorCiteID ID #REQUIRED>
In XML-DR, this becomes:
<AttributeType name = "authorCiteID" dt:type = "ID" required
= "yes"/>
<ElementType name = "Author" content = "eltOnly" order =
"seq">
<attribute type = "authorCiteID"/>
<element type = "FirstName"/>
<element type = "MI" minOccurs = "0" maxOccurs =
"1"/>
<element type = "LastName"/>
<element type = "Biographical"/>
<element type = "Portrait"/>
</ElementType>
We've retained the XML ID type for authorCiteID to preserve the
link between authors and books. Note especially the cardinality on
MI. It can occur either zero times or once, that is, it is optional.
Now declare the child elements of <Author>:
<ElementType name = "FirstName" content = "textOnly"/>
<ElementType name = "MI" content = "textOnly"/>
<ElementType name = "LastName" content = "textOnly"/>
<ElementType name = "Biographical" content =
"textOnly"/>
<AttributeType name = "picLink"/>
<ElementType name = "Portrait" content = "empty">
<attribute type = "picLink"/>
</ElementType>
Close the top level <Schema> element and you are done. Now
you have a reusable schema that can be brought in wherever we markup
author information.
Catalog Schema
Having removed the author elements from the catalog DTD and placed
them in a separate schema, we now turn our attention to recreating
the catalog data in XML. We will call this schema PubCatalog.xml.
This will borrow from the author schema when we need to contain
author details. Here's the opening information:
<?xml version ="1.0"?>
<Schema name = "PubCatalog.xml"
xmlns =
"urn:schemas-microsoft-com:xml-data"
xmlns:dt =
"urn:schemas-microsoft-com:datatypes"
xmlns:athr =
"x-schema:authors.xml">
Note how we've added a namespace declaration for our newly created
author schema authors.xml with the alias prefix athr.
Let's dig right in: we start with the <Catalog> element,
which contains other elements. This contains the <Publisher>,
<Thread>, and <Book> elements, just as we had in the
earlier catalog.dtd, each of which may occur many times.
<ElementType name = "Catalog" content =
"eltOnly" order = "seq">
<element type = "Publisher"
minOccurs = "1" maxOccurs = "*"/>
<element type = "Thread"
minOccurs = "0" maxOccurs = "*"/>
<element type = "Book"
minOccurs = "1" maxOccurs = "*"/>
</ElementType>
Next we need to declare the isbn attribute, which will be used
within both the <Publisher> and <Book> elements that we
have just declared:
<AttributeType name = "isbn" required =
"yes"/>
Publisher
The next section that we need to address is the content of the
<Publisher> element we just declared. This still contains the
same first three child elements that we saw in the DTD, however we
have created a separate schema for the author details, so we need to
refer to that namespace and borrow from it.
As we mentioned, we can make use of the <description>
element to make information about the DTD available to a processing
application, and that is just what we do, here we are using it
to specify that the <Publisher> element is used for publishers
information.
<ElementType name = "Publisher" content =
"eltOnly" order = "seq">
<description> Publisher
section </description>
<attribute type =
"isbn"/>
<element type =
"CorporateName"/>
<element type = "Address"
minOccurs = "1" maxOccurs = "*"/>
<element type =
"Imprints"/>
<element type = "athr:Author"
minOccurs = "0" maxOccurs = "*"/>
</ElementType>
Drilling down in to the schema, the <CorporateName> element
that is simple an element that contained PCDATA in the DTD, so we
specify that it's content is text only:
<ElementType name = "CorporateName" content =
"textOnly"/>
Next we have the address information, which you may recall
contained a yes/no enumeration for the attribute headquarters, which
we define first:
<AttributeType name = "headquarters"
dt:type =
"enumeration" dt:values = "yes no"/>
<ElementType name = "Address" content =
"eltOnly" order = "seq">
<attribute type =
"headquarters"/>
<element type = "Street"
minOccurs = "1" maxOccurs = "*"/>
<element type =
"City"/>
<element type =
"PoliticalDivision"/>
<element type =
"Country"/>
<element type =
"PostalCode"/>
</ElementType>
Note the form of the enumeration datatype in XML-DR. Continuing,
we declare the elements used in the address elements:
<ElementType name = "Street" content =
"textOnly"/>
<ElementType name = "City" content =
"textOnly"/>
<ElementType name = "PoliticalDivision" content
= "textOnly">
<description>State,
province, canton, etc.</description>
</ElementType>
<ElementType name = "Country" content =
"textOnly"/>
<ElementType name = "PostalCode" content =
"textOnly"/>
The third child element of the <Publisher> element is about
the publisher imprints:
<ElementType name = "Imprints" content =
"eltOnly" order = "seq">
<element type = "Imprint"
minOccurs = "1" maxOccurs = "*"/>
</ElementType>
<AttributeType name = "shortImprintName" dt:type
= "ID"/>
<ElementType name = "Imprint" content =
"textOnly">
<attribute type =
"shortImprintName"/>
</ElementType>
The fourth child of the <Publisher> element held the author
details in the DTD, but seeing as we have removed it, we can move on
to the <Thread>.
Thread
<Thread> was used to specify the category area of the book.
If you look above the bar code on the back of the book, you can see
three different threads that are used to categorize the book, these
are used, for example, by bookstores when deciding which section to
put the book in.
<AttributeType name = "threadID" dt:type =
"ID"/>
<ElementType name = "Thread" content =
"textOnly">
<description>
Subject threads
consist of one or more books
related by some
thread of study
</description>
<attribute type = "threadID"/>
</ElementType>
Again we have used a <description> element to explain what
threads are used for.
Book
The final section is the one that deals with the books themselves.
As we noted in the DTD chapter, this must include title, abstract,
recommended subject categories and price:
Before we define the elements, we must define several
attributes:
<AttributeType name = "ISBN" dt:type = "ID"
required = "yes"/>
<AttributeType name = "level"/>
<AttributeType name = "pubdate" required =
"yes"/>
Next we reach the pageCount attribute. This was one place we
decided we could really use strong typing of data. We'll make the
attribute an integer type:
<AttributeType name = "pageCount" dt:type="int"
required = "yes"/>
Then we continue with the various references:
<AttributeType name = "authors" dt:type =
"IDREFS"/>
<AttributeType name = "threads" dt:type =
"IDREFS"/>
<AttributeType name = "imprint" dt:type =
"IDREF"/>
<AttributeType name = "shortImprintName" dt:type
= "ID"/>
Having set the attributes that we will be using, we declare the
content of <Book>, which uses the attributes we have just
declared and several child elements:
<ElementType name = "Book" content = "eltOnly"
order = "seq">
<description> Book summary
information (no content) </description>
<attribute type =
"ISBN"/>
<attribute type =
"level"/>
<attribute type =
"pubdate"/>
<attribute type =
"pageCount"/>
<attribute type =
"authors"/>
<attribute type =
"threads"/>
<attribute type =
"imprint"/>
<element type =
"Title"/>
<element type =
"Abstract"/>
<element type =
"RecSubjCategories"/>
<element type = "Price"
minOccurs = "0" maxOccurs = "1"/>
</ElementType>
Then we can describe the contents of these child elements:
<ElementType name = "Title" content =
"textOnly"/>
<ElementType name = "Abstract" content =
"textOnly"/>
<ElementType name = "RecSubjCategories" content
= "eltOnly" order = "seq">
<element type =
"Category"/>
<element type =
"Category"/>
<element type =
"Category"/>
</ElementType>
<ElementType name = "Category" content =
"textOnly"/>
The <Price> element declaration again brings us to datatype
support. The currency attribute requires an enumeration, while the
text value of the element itself should be a numeric type suitable
for representing currency:
<AttributeType name = "currency" dt:type =
"enumeration"
dt:values = "USD GBF CD" required = "yes"/>
<ElementType name = "Price" dt:type="fixed.14.4"
content = "textOnly">
<attribute type =
"currency"/>
</ElementType>
</Schema>
And that's it, with a bit of translation from DTD syntax to XML-DR
and the addition of some strong typing, we have created a new catalog
schema that reuses the authors schema through namespace support. It
gives us the same sort of validation support as the DTD provided we
change the root element of the sample catalog.xml file to reflect the
use of the schema:
<?xml version ="1.0"?>
<Catalog xmlns = "x-schema:PubCatalog.xml">
Note that the namespace declaration eliminates the need for a
DOCTYPE declaration.
It would be nice to have a simple listing of all the elements in a
schema and their contents. That is, for each element declaration,
we'd like a list of the permissible child elements and attributes
used by that element. That way, we'd be able to gauge the impact of
changing any particular element or attribute. Since XML-DR schemas
use XML syntax, we can use MSXML and some JavaScript to produce this
utility. Here's what it will look like when it is finished and
pointed at our PubCatalog.xml schema file:
The source code for SchemaConcordance.html is available from our
Web site at http://www.wrox.com. There are no configuration
requirements other than to provide an appropriate URL to the file you
wish to cross-index.
Finding the Elements
We know that a schema document starts with a root <Schema>
element. Its child elements will be <ElementType> and
<AttributeType> elements. The elements are declared with
<ElementType> elements, and each such element contains a list
of the elements and attributes it contains. This simplifies our task
somewhat. All we have to do is walk the list of the <Schema>
element's child nodes and process each <ElementType> element we
find. Here is the heart of the code we need:
if (parser.documentElement.nodeName == "Schema")
{
for (var ni=0; ni <
parser.documentElement.childNodes.length; ni++)
{
if
(parser.documentElement.childNodes(ni).nodeName ==
"ElementType")
CrossRefElement(parser.documentElement.childNodes(ni));
}
}
We know the number of child elements, so walking the entire
document can be performed in a simple loop. The nodeName property of
the element nodes lets us find the element declarations by looking
for the name <ElementType>.
Processing an Element Declaration
The function CrossRefElement() accepts an <ElementType>
element node and lists its contents. This is where we encounter a
complication. There is no guarantee that <element> and
<attribute> elements will be sorted. A schema could list
attributes before elements in one ElementType, then reverse it in
another, or even intermix the two. We need a consistent order so we
can add the appropriate title in our output. We will have to create
two arrays, one for element names and one for attribute names and
then display the results when we are finished with the element
declaration. Here is the part of CrossRefElement() that extracts
element declaration information:
var rChildElements = new Array();
var rAttributes = new Array();
var WorkNode;
var nEltCount = 0;
var nAttrCount = 0;
for (ni = 0; ni < eltNode.childNodes.length; ni++)
{
WorkNode = eltNode.childNodes(ni);
switch (WorkNode.nodeName)
{
case "element":
rChildElements[nEltCount++] =
WorkNode.attributes.getNamedItem("type").text;
break;
case "attribute":
rAttributes[nAttrCount++] =
WorkNode.attributes.getNamedItem("type").text;
break;
case "group":
SqueezeGroup(WorkNode, rChildElements, rAttributes);
nEltCount =
rChildElements.length;
nAttrCount =
rAttributes.length;
break;
}
}
...
When we encounter an <element> or <attribute> schema
element, we get the value of the type attribute, which we know to be
the name of an associated <ElementType> or
<AttributeType> element. We do this by making use of the
getNamedItem() function, which is specific to the extensions of the
DOM implemented by Microsoft in MSXML to retrieve an attribute by
name. If schemas didn't contain groups, our work would be done. Since
groups do contain element and attribute information we need, we must
call another function; SqueezeGroup(). This function looks almost the
same as what we see above:
function SqueezeGroup(node, rElts, rAttrs)
{
var nEltCt = rElts.length;
var nAttrCt = rAttrs.length;
var childNode;
// Fix up indices for empty arrays
if (nEltCt < 0)
nEltCt = 0;
if (nAttrCt < 0)
nEltCt = 0;
for (var nj = 0; nj < node.childNodes.length;
nj++)
{
childNode =
node.childNodes(nj);
switch (childNode.nodeName)
{
case
"element":
rElts[nEltCt++] =
childNode.attributes.getNamedItem("type").text;
break;
case
"attribute";
rAttrs[nAttrCt++] =
childNode.attributes.getNamedItem("type").text;
break;
case
"group":
SqueezeGroup(childNode, rElts, rAttrs);
nEltCt = rElts.length;
nAttrCt = rAttrs.length;
break;
}
}
}
SqueezeGroup() is passed the group node and the arrays containing
the element and attribute names. Since there may be some names in the
arrays by the time we get here, we have to set our array indices
based on the current length of the array:
var nEltCt = rElts.length;
var nAttrCt = rAttrs.length;
Since SqueezeGroup() can add to the count, CrossRefElement() must
reset its indices when control returns to it from SqueezeGroup():
case "group":
SqueezeGroup(WorkNode, rChildElements,
rAttributes);
nEltCount = rChildElements.length;
nAttrCount = rAttributes.length;
break;
Finally, since groups may contain other groups, we call
SqueezeGroup() recursively to make sure we get all the information
out of a group:
case "group":
SqueezeGroup(childNode, rElts, rAttrs);
nEltCt = rElts.length;
nAttrCt = rAttrs.length;
break;
Displaying the Results
Once an <ElementType> element is completely processed, we
can use DHTML to display the results in a named DIV. The last part of
CrossRefElement() does this for us:
sEltHeader = "Element " +
eltNode.attributes.getNamedItem("name").text +
" content = " +
eltNode.attributes.getNamedItem("content").text;
ListLine(sEltHeader , "green");
tabsize += 12;
// List all child elements
if (rChildElements.length > 0)
{
ListLine("elements", "blue");
tabsize += 12;
for (ni = 0; ni < rChildElements.length;
ni++)
ListLine(rChildElements[ni],
"black");
tabsize -= 12;
}
// List all attributes
if (rAttributes.length > 0)
{
ListLine("attributes", "blue");
tabsize += 12;
for (ni = 0; ni < rAttributes.length; ni++)
ListLine(rAttributes[ni],
"black");
tabsize -= 12;
}
tabsize -= 12;
ListLine() is a utility function that takes some text and a color
literal string and inserts the text into the DIV in the appropriate
color. The variables tabsize and listline are global variables used
to control relative placement of text.
©1999 Wrox Press Limited,
US and UK.