BizTalk Utilities CV ,   Jobs ,   Code library  
 
 
Page 4 of 5

 

Previous Page Table Of Contents Next Page

Variations on the Book Catalog

By now, I hope you are eager to apply namespace and schema information to our book catalog example. Although we won't follow this schema through the rest of the book, I'm going to show how we can use what I've presented in this chapter to improve our understanding and organization of book publishing information.

Why Bother?

What is wrong with our Book Catalog DTD? Nothing, really, but it is starting to get long. Everything about book catalogs has to go into the one DTD if we are to properly validate a catalog document. All the criticisms leveled against DTDs earlier in the chapter in general apply to our particular DTD.

The first this we will do is break up the publishing catalog domain into two separate schemas, one reflecting a namespace dealing with authors, the other catalog information. Additionally, we can provide strong typing on some of our attributes and elements, thereby making our life easier when it comes time to write applications that process catalogs. Since XML Schemas is in a state of flux as it nears Recommendation status, we will use the XML-DR version of schemas as implemented in MSXML.

Segmentation

Our catalog.dtd that we met in Chapter 3 gave us a number of concepts. A catalog certainly needs books, but wouldn't it be better if the authors live outside the catalog? After all, if we need to write a schema for marking up the actual content of individual books, we will probably want to include author information there as well. This is one of the main motivations in splitting up our Book Catalog DTD into two separate schemas: Catalog and Author. When we want to create a catalog document, we can declare a default namespace for the Catalog schema and then use a qualified namespace to bring in the Author schema.

Additional Expression

In our bookCatalog.dtd we had a few attributes that could benefit from strong typing. If we included data types, it would make it easier to total up page counts, and we would certainly like to be able to be able to calculate order totals given an order from the catalog. So, we need to go through the Catalog schema and see what attributes should be qualified with type information.

Metadata Discovery

Creating a schema in XML syntax is useful to programmers, in that we give them a little bit of support for writing programs that manipulate book catalog documents marked up according to our schema. The greatest support we provide is simply taking the DTD into schema form. Once it is in XML syntax, programmers can use the same parser that they use with XML document instances to discover the meaning behind the metadata.

Suppose you were unfamiliar with our schema. You could investigate individual elements with the <definition> element. This would be useful in a document browser. A user might click for additional information on a particular item and see the metadata associated with it. We wouldn't present the XML definition, of course, though we might show that an item is a numeric type. In the case of enumerations, we would certainly show the range of values the item could take. Cardinality information is certainly important to see, as is the knowledge of whether an attribute is required or not. All these could be discovered at the time a document instance is read as long as we provide a schema in XML syntax to go with it. After we've turned our catalog DTD into a schema I'll show how we can use the DOM to generate a concordance of elements in a schema. This will take a schema and provide you with a cross-reference of elements and how they are used.

Recasting the DTD

Let's take a closer look at our DTD. We'll work through a translation into XML-DR format showing incremental improvements to our definition as we go.

There is no clear consensus on what file extension to use for XML-DR schema files. Microsoft sources tend to use xml, while one commercially available tool uses xdr. The W3C Schema working group, as we have seen, favors xsd for their version of schemas. I will use xml for the examples that follow. In any case, a schema is XML, so its MIME type remains text/xml.

As you may remember, the catalog was split into three sections:

Publisher information about the publisher

Threads that contain descriptive information

Books containing the information about the books

The publisher section also included the author details, however we are going to remove the author details and place them in the separate author schema so that we can borrow from it in the catalog schema, and make use of it in other areas as well. So let's start with the author schema, before coming back to the rest of the catalog.

Author Schema

We ought look at the authors schema first, because the catalog schema we develop next will borrow from it. The first thing to do is cut out the declaration of the <Author> element and everything subordinate to it and create a new schema file, authors.xml. The top of the file should declare conformance to XML 1.0, name the schema, and declare the XML-DR and datatypes namespaces:

<?xml version ="1.0"?>

<Schema name = "authors.xml"

        xmlns = "urn:schemas-microsoft-com:xml-data"

        xmlns:dt = "urn:schemas-microsoft-com:datatypes">

Note that the default namespace is XML-DR and the datatypes namespace will be aliased with the prefix dt. The Author element is our starting place. It contains only element content consisting of the name-related, <Biographical>, and <Portrait> elements in sequence:

<!ELEMENT Author  ((FirstName, MI?, LastName, Biographical, Portrait)>    

<!ATTLIST Author authorCiteID ID  #REQUIRED>

In XML-DR, this becomes:

<AttributeType name = "authorCiteID" dt:type = "ID" required = "yes"/>

<ElementType name = "Author" content = "eltOnly" order = "seq">

   <attribute type = "authorCiteID"/>

    <element type = "FirstName"/>

   <element type = "MI" minOccurs = "0" maxOccurs = "1"/>

   <element type = "LastName"/>

   <element type = "Biographical"/>

   <element type = "Portrait"/>

</ElementType>

We've retained the XML ID type for authorCiteID to preserve the link between authors and books. Note especially the cardinality on MI. It can occur either zero times or once, that is, it is optional. Now declare the child elements of <Author>:

<ElementType name = "FirstName" content = "textOnly"/>

<ElementType name = "MI" content = "textOnly"/>

<ElementType name = "LastName" content = "textOnly"/>

<ElementType name = "Biographical" content = "textOnly"/>

<AttributeType name = "picLink"/>

<ElementType name = "Portrait" content = "empty">

  <attribute type = "picLink"/>

</ElementType>

Close the top level <Schema> element and you are done. Now you have a reusable schema that can be brought in wherever we markup author information.

Catalog Schema

Having removed the author elements from the catalog DTD and placed them in a separate schema, we now turn our attention to recreating the catalog data in XML. We will call this schema PubCatalog.xml. This will borrow from the author schema when we need to contain author details. Here's the opening information:

<?xml version ="1.0"?>

<Schema name = "PubCatalog.xml"

        xmlns = "urn:schemas-microsoft-com:xml-data"

        xmlns:dt = "urn:schemas-microsoft-com:datatypes"

        xmlns:athr = "x-schema:authors.xml">

Note how we've added a namespace declaration for our newly created author schema authors.xml with the alias prefix athr.

Let's dig right in: we start with the <Catalog> element, which contains other elements. This contains the <Publisher>, <Thread>, and <Book> elements, just as we had in the earlier catalog.dtd, each of which may occur many times.

   <ElementType name = "Catalog" content = "eltOnly" order = "seq">

      <element type = "Publisher" minOccurs = "1" maxOccurs = "*"/>

      <element type = "Thread" minOccurs = "0" maxOccurs = "*"/>

      <element type = "Book" minOccurs = "1" maxOccurs = "*"/>

   </ElementType>

Next we need to declare the isbn attribute, which will be used within both the <Publisher> and <Book> elements that we have just declared:

   <AttributeType name = "isbn" required = "yes"/>

Publisher

The next section that we need to address is the content of the <Publisher> element we just declared. This still contains the same first three child elements that we saw in the DTD, however we have created a separate schema for the author details, so we need to refer to that namespace and borrow from it.

As we mentioned, we can make use of the <description> element to make information about the DTD available to a processing application, and that is just  what we do, here we are using it to specify that the <Publisher> element is used for publishers information.

   <ElementType name = "Publisher" content = "eltOnly" order = "seq">

      <description> Publisher section </description>

      <attribute type = "isbn"/>

      <element type = "CorporateName"/>

      <element type = "Address" minOccurs = "1" maxOccurs = "*"/>

      <element type = "Imprints"/>

      <element type = "athr:Author" minOccurs = "0" maxOccurs = "*"/>

   </ElementType>

Drilling down in to the schema, the <CorporateName> element that is simple an element that contained PCDATA in the DTD, so we specify that it's content is text only:

  <ElementType name = "CorporateName" content = "textOnly"/>

Next we have the address information, which you may recall contained a yes/no enumeration for the attribute headquarters, which we define first:

   <AttributeType name = "headquarters"

                   dt:type = "enumeration" dt:values = "yes no"/>

   <ElementType name = "Address" content = "eltOnly" order = "seq">

      <attribute type = "headquarters"/>

      <element type = "Street" minOccurs = "1" maxOccurs = "*"/>

      <element type = "City"/>

      <element type = "PoliticalDivision"/>

      <element type = "Country"/>

      <element type = "PostalCode"/>

   </ElementType>

Note the form of the enumeration datatype in XML-DR. Continuing, we declare the elements used in the address elements:

   <ElementType name = "Street" content = "textOnly"/>

   <ElementType name = "City" content = "textOnly"/>

   <ElementType name = "PoliticalDivision" content = "textOnly">

      <description>State, province, canton, etc.</description>

   </ElementType>

   <ElementType name = "Country" content = "textOnly"/>

   <ElementType name = "PostalCode" content = "textOnly"/>

The third child element of the <Publisher> element is about the publisher imprints:

   <ElementType name = "Imprints" content = "eltOnly" order = "seq">

      <element type = "Imprint" minOccurs = "1" maxOccurs = "*"/>

   </ElementType>

   <AttributeType name = "shortImprintName" dt:type = "ID"/>

   <ElementType name = "Imprint" content = "textOnly">

      <attribute type = "shortImprintName"/>

   </ElementType>

The fourth child of the <Publisher> element held the author details in the DTD, but seeing as we have removed it, we can move on to the <Thread>.

Thread

<Thread> was used to specify the category area of the book. If you look above the bar code on the back of the book, you can see three different threads that are used to categorize the book, these are used, for example, by bookstores when deciding which section to put the book in.

   <AttributeType name = "threadID" dt:type = "ID"/>   

   <ElementType name = "Thread" content = "textOnly">

      <description>

         Subject threads consist of one or more books

         related by some thread of study

      </description>

   <attribute type = "threadID"/>

   </ElementType>

Again we have used a <description> element to explain what threads are used for.

Book

The final section is the one that deals with the books themselves. As we noted in the DTD chapter, this must include title, abstract, recommended subject categories and price:

Before we define the elements, we must define several attributes:

   <AttributeType name = "ISBN" dt:type = "ID" required = "yes"/>

   <AttributeType name = "level"/>

   <AttributeType name = "pubdate" required = "yes"/>

Next we reach the pageCount attribute. This was one place we decided we could really use strong typing of data. We'll make the attribute an integer type:

   <AttributeType name = "pageCount" dt:type="int" required = "yes"/>

Then we continue with the various references:

   <AttributeType name = "authors" dt:type = "IDREFS"/>

   <AttributeType name = "threads" dt:type = "IDREFS"/>

   <AttributeType name = "imprint" dt:type = "IDREF"/>

   <AttributeType name = "shortImprintName" dt:type = "ID"/>

Having set the attributes that we will be using, we declare the content of <Book>, which uses the attributes we have just declared and several child elements:

   <ElementType name = "Book" content = "eltOnly" order = "seq">

      <description> Book summary information (no content) </description>

      <attribute type = "ISBN"/>

      <attribute type = "level"/>

      <attribute type = "pubdate"/>

      <attribute type = "pageCount"/>

      <attribute type = "authors"/>

      <attribute type = "threads"/>

      <attribute type = "imprint"/>

      <element type = "Title"/>

      <element type = "Abstract"/>

      <element type = "RecSubjCategories"/>

      <element type = "Price" minOccurs = "0" maxOccurs = "1"/>

   </ElementType>

Then we can describe the contents of these child elements:

   <ElementType name = "Title" content = "textOnly"/>

   <ElementType name = "Abstract" content = "textOnly"/>

   <ElementType name = "RecSubjCategories" content = "eltOnly" order = "seq">

      <element type = "Category"/>

      <element type = "Category"/>

      <element type = "Category"/>

   </ElementType>

   <ElementType name = "Category" content = "textOnly"/>

The <Price> element declaration again brings us to datatype support. The currency attribute requires an enumeration, while the text value of the element itself should be a numeric type suitable for representing currency:

   <AttributeType name = "currency" dt:type = "enumeration"

                  dt:values = "USD GBF CD" required = "yes"/>

   <ElementType name = "Price" dt:type="fixed.14.4" content = "textOnly">

      <attribute type = "currency"/>

   </ElementType>

</Schema>

And that's it, with a bit of translation from DTD syntax to XML-DR and the addition of some strong typing, we have created a new catalog schema that reuses the authors schema through namespace support. It gives us the same sort of validation support as the DTD provided we change the root element of the sample catalog.xml file to reflect the use of the schema:

<?xml version ="1.0"?>

<Catalog xmlns = "x-schema:PubCatalog.xml">

Note that the namespace declaration eliminates the need for a DOCTYPE declaration.

Schema Concordance

It would be nice to have a simple listing of all the elements in a schema and their contents. That is, for each element declaration, we'd like a list of the permissible child elements and attributes used by that element. That way, we'd be able to gauge the impact of changing any particular element or attribute. Since XML-DR schemas use XML syntax, we can use MSXML and some JavaScript to produce this utility. Here's what it will look like when it is finished and pointed at our PubCatalog.xml schema file:

The source code for SchemaConcordance.html is available from our Web site at http://www.wrox.com. There are no configuration requirements other than to provide an appropriate URL to the file you wish to cross-index.

Finding the Elements

We know that a schema document starts with a root <Schema> element. Its child elements will be <ElementType> and <AttributeType> elements. The elements are declared with <ElementType> elements, and each such element contains a list of the elements and attributes it contains. This simplifies our task somewhat. All we have to do is walk the list of the <Schema> element's child nodes and process each <ElementType> element we find. Here is the heart of the code we need:

if (parser.documentElement.nodeName == "Schema")

{

   for (var ni=0; ni < parser.documentElement.childNodes.length; ni++)

   {

      if (parser.documentElement.childNodes(ni).nodeName == "ElementType")

         CrossRefElement(parser.documentElement.childNodes(ni));

   }

}

We know the number of child elements, so walking the entire document can be performed in a simple loop. The nodeName property of the element nodes lets us find the element declarations by looking for the name <ElementType>.

Processing an Element Declaration

The function CrossRefElement() accepts an <ElementType> element node and lists its contents. This is where we encounter a complication. There is no guarantee that <element> and <attribute> elements will be sorted. A schema could list attributes before elements in one ElementType, then reverse it in another, or even intermix the two. We need a consistent order so we can add the appropriate title in our output. We will have to create two arrays, one for element names and one for attribute names and then display the results when we are finished with the element declaration. Here is the part of CrossRefElement() that extracts element declaration information:

var rChildElements = new Array();

var rAttributes = new Array();

var WorkNode;

var nEltCount = 0;

var nAttrCount = 0;

for (ni = 0; ni < eltNode.childNodes.length; ni++)

{

   WorkNode = eltNode.childNodes(ni);

   switch (WorkNode.nodeName)

   {

      case "element":

         rChildElements[nEltCount++] =

                             WorkNode.attributes.getNamedItem("type").text;

         break;

      case "attribute":

         rAttributes[nAttrCount++] =

                             WorkNode.attributes.getNamedItem("type").text;

         break;

      case "group":

         SqueezeGroup(WorkNode, rChildElements, rAttributes);

         nEltCount = rChildElements.length;

         nAttrCount = rAttributes.length;

         break;

   }

}

...

When we encounter an <element> or <attribute> schema element, we get the value of the type attribute, which we know to be the name of an associated <ElementType> or <AttributeType> element. We do this by making use of the getNamedItem() function, which is specific to the extensions of the DOM implemented by Microsoft in MSXML to retrieve an attribute by name. If schemas didn't contain groups, our work would be done. Since groups do contain element and attribute information we need, we must call another function; SqueezeGroup(). This function looks almost the same as what we see above:

function SqueezeGroup(node, rElts, rAttrs)

{

   var nEltCt = rElts.length;

   var nAttrCt = rAttrs.length;

   var childNode;

   // Fix up indices for empty arrays

   if (nEltCt < 0)

      nEltCt = 0;

   if (nAttrCt < 0)

      nEltCt = 0;

   for (var nj = 0; nj < node.childNodes.length; nj++)

   {

      childNode = node.childNodes(nj);

      switch (childNode.nodeName)

      {

         case "element":

            rElts[nEltCt++] = childNode.attributes.getNamedItem("type").text;

            break;

         case "attribute";

            rAttrs[nAttrCt++] =

                              childNode.attributes.getNamedItem("type").text;

            break;

         case "group":

            SqueezeGroup(childNode, rElts, rAttrs);

            nEltCt = rElts.length;

            nAttrCt = rAttrs.length;

            break;

      }

   }

}

SqueezeGroup() is passed the group node and the arrays containing the element and attribute names. Since there may be some names in the arrays by the time we get here, we have to set our array indices based on the current length of the array:

var nEltCt = rElts.length;

var nAttrCt = rAttrs.length;

Since SqueezeGroup() can add to the count, CrossRefElement() must reset its indices when control returns to it from SqueezeGroup():

case "group":

   SqueezeGroup(WorkNode, rChildElements, rAttributes);

   nEltCount = rChildElements.length;

   nAttrCount = rAttributes.length;

   break;

Finally, since groups may contain other groups, we call SqueezeGroup() recursively to make sure we get all the information out of a group:

case "group":

   SqueezeGroup(childNode, rElts, rAttrs);

   nEltCt = rElts.length;

   nAttrCt = rAttrs.length;

   break;

Displaying the Results

Once an <ElementType> element is completely processed, we can use DHTML to display the results in a named DIV. The last part of CrossRefElement() does this for us:

sEltHeader = "Element " + eltNode.attributes.getNamedItem("name").text +

             "  content = " + eltNode.attributes.getNamedItem("content").text;

ListLine(sEltHeader , "green");

tabsize += 12;

// List all child elements

if (rChildElements.length > 0)

{

   ListLine("elements", "blue");

   tabsize += 12;

   for (ni = 0; ni < rChildElements.length; ni++)

      ListLine(rChildElements[ni], "black");

   tabsize -= 12;

}

// List all attributes

if (rAttributes.length > 0)

{

   ListLine("attributes", "blue");

   tabsize += 12;

   for (ni = 0; ni < rAttributes.length; ni++)

      ListLine(rAttributes[ni], "black");

   tabsize -= 12;

}

tabsize -= 12;

ListLine() is a utility function that takes some text and a color literal string and inserts the text into the DIV in the appropriate color. The variables tabsize and listline are global variables used to control relative placement of text.

©1999 Wrox Press Limited, US and UK.

Page 4 of 5

 

Previous Page Table Of Contents Next Page
 

Recent Jobs

A great opportunity to Digital Vide
here is a greate opportunity as a S
A great opportunity as a Network En
A Greate Opportunituy as a SQL Deve
An immediate job opportunity as a B

View all Jobs (Add yours)
View all CV (Add yours)



Information Online

swimming pool contractor
chicago web site design
conference call
Web Hosting
gotomeeting
designer sunglasses
answering service


    Email TopXML  

Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP