BizTalk Utilities CV ,   Jobs ,   Code library  
 
 
Page 2 of 5

 

Previous Page Table Of Contents Next Page
Back Contents Next

Mixing Vocabularies

Recall the Book Catalog DTD that we met in Chapter 3. After building a site that exposes your book catalogs as XML written to the PubCatalog.dtd vocabulary, you might decide to sell the books on-line. This means that you need to be able to take orders for the books in the catalog. So, you need a DTD that covers ordering of books.

If you continue in the vein of the DTD chapter, you might start adding to the PubCatalog.dtd file, because the two areas address different parts of the same problem domain: sharing data about books. They can, however, also be seen as different problem domains because the one addresses the whole of the catalogs details, while the other addresses sale of items from the catalog. While there is some overlap in the information conveyed by the two topics, if you tried to use one DTD to cover both areas, you might end up with one very large or confusing DTD.

DTDs that are large or that contain information about different topics are hard for programmers to read and understand. More importantly, if you're already using the catalog DTD in production, making changes to it now might affect that application. There is a better solution, however, merging data that conforms to separate catalog and order DTDs in a single document using namespaces, so we should explore this possibility. But first, let's take a closer look at the problems you face.

Segmenting a Problem

To begin with, why would you want to mix order details with the catalog DTD? You have at least two areas you're discussing, catalogs of all books and the sales of individual titles. If you think about when you're writing a large program, you break the overall problem into smaller pieces. Modules, classes, components, packages and functions are some of the constructs programming languages offer for this purpose. Designing a vocabulary can be seen as a similar problem to writing a program. You need ways to segment a large problem into multiple vocabularies. However, the problem we have to overcome isn't really one of writing individual DTDs to describe multiple vocabularies, we have already seen how we can do that in Chapter 3. The real problem lies in integrating the DTDs into the body of one document if we segment the definitions into catalog and order DTDs.

Reuse

In our PubCatalog.dtd we made use of the book element. This makes perfect sense in the way that we have talked about marking up our data in a way that describes contents of the element. However, as we are considering taking on-line orders for books, we are likely to want to use the same element name again, when referring to the book that a customer wants to order. Indeed, it is likely that the two would be described differently in the two DTDs. After all, the book element in the order might be a child of an order element, whereas it is a child of catalog in the PubCatalog.dtd.

As we have already suggested, this is a problem that will occur time and again as we create XML vocabularies. When describing real world concepts, we will continually find that common constructs keep appearing. After all, complex creations are built from simple building blocks - color, shape, price, and dimensions, for example - and simple things don't go undefined for long, so there will be many instances of element names that already have definitions and content models.

If either you, or someone else, has already created a DTD that uses these elements, your task will be made easier by borrowing from proven DTDs (indeed code to handle constructs defined in your vocabularies may even be available), this is the concept of reuse.

If you are programming for a corporation, you may be confronted with an existing body of DTDs. Borrowing from them can, in fact, make your life easier; while ignoring them makes everyone else's job harder as the DTDs represent an intellectual investment in a particular set of definitions by the

programmers involved. These DTDs describe the business problem as others know it. As in real life, building on the DTDs related to books in our example, means that your task is to extend it in a way that flows naturally from the concepts that are already known and defined.

Indeed, if you are programming an application that must connect to an external partner's programs, you have little choice but to reuse existing concepts. The DTDs already in use form a common language you need to speak in order to be understood. Whenever concepts already exist, you should work to be understood in terms of those concepts. The users of pre-existing definitions have made an effort to develop and internalize them. Convincing them to adapt to your view of the problem may be insurmountably difficult. Even if you can accomplish this feat, additional cost is incurred in terms of building new definitions and code, or mapping from an existing DTD to your new one. Reuse saves time, effort, and money.

Ambiguity and Name Collision

Whether you're reusing useful definitions from another designer's DTD or combining segmented DTDs to create a document describing a composite problem, you risk the problems of ambiguity and name collision if the documents you are using feature elements of the same name. For example, books are a pretty common concept. You can be sure there are several DTDs that declare a Book element, at least for publishers and printers, retailers and libraries. A single usage of the name Book in a document needs resolution to match it with the proper Book element declaration. In our example Book is a name common to both catalogs and orders.

A document marked up using the PubCatalog.dtd may include the following use of the element <Book>:

<Book>

   <Title>Professional XML</Title>

   <Abstract>Compendium book containing everything you need to learn to use

             XML in your programming solutions today.</Abstract>

   <RecSubjCategories>

      <Category>XML</Category>

      <Category>Programming</Category>

      <Category>Internet</Category>

   </RecSubjCategories>

</Book>

Whereas an order for a book may require the following use of a <Book> element:

<Order>

...

   payment and shipping information

...

   <Item>

      <Book>

         <Title>Professional XML</Title>

         <ISBN>1-861003-11-0</ISBN>

      </Book>

      <Quantity>3</Quantity>

      <Price US$="49.99" />

      <Discount US$="10.00" />

      <SubTotal US$="119.97" />

   </Item>

</Order>

If I'm reading an XML document that includes data from both of the vocabularies, how do I know which definition it refers to?

The problem becomes acute when you use instances of a name drawn from multiple DTDs. Assume we have an application for civil engineers involved in town planning. When talking about lighting, we want to draw on pre-existing DTDs for traffic lights and street lights. Working in isolation, the respective vocabulary designers each chose the word <Light> as an element name. Had they known of the eventual use of their DTDs, they might have chosen <TrafficSignal> and <StreetLamp>, but this future use was not known at the time the DTDs were written. Now we are faced with the specter of documents that have ambiguous Light elements.

The declarations for the two uses of <Light> are very different. The first declaration covers traffic signals and has an enumeration for its color attribute. This enumeration is very important, as there are only three valid colors for our traffic signals. An application can be expected to do some error checking based on the value of this attribute:

<!ELEMENT Light EMPTY>

<!ATTLIST Light color (red | yellow | green) #REQUIRED>

The second declaration has no such restriction on its color attribute's value. Indeed, lamps are often chosen on the basis of cost, not color, although the color is still specified:

<!ELEMENT Light EMPTY>

<!ATTLIST Light color CDATA   #REQUIRED>

Now consider the following XML document written by an application that mixes the two DTDs:

<Inventory>

  <Light color="red"/>

  . . .

  <Light color="white"/>

  ...

</Inventory>

From this, we cannot tell whether the Light elements refer to traffic lights or street lamps (without checking the constraints on colors implied in the DTD). So, how would a receiving application know whether the color attribute's values are acceptable? We don't know which element refers to which DTD, and the value of the second Light element's color attribute would not be valid for the purposes of traffic lights. The problem is known as ambiguity for well-formed documents. Furthermore if the names Light and color required validation we could make a very big mess of our application, this is referred to as the problem of name collisions.

Namespaces

XML namespaces are the solution to the problems of ambiguity and name collisions. According to the W3C's Recommendation 'Namespaces in XML' (14 January 1999), a namespace is

...a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names.

A collection of names that has structure; this sounds like a DTD and indeed, a DTD can be a namespace. In this case the URI could be the address of the DTD on your server, for example:

http://www.wrox.com/xmldtds/PubCatalog.dtd

The URI need not be a URL, though. (If you are unsure of the differences between the two, we describe them shortly.) In this case the namespace refers to the names used in the PubCatalog.dtd. So, if we were to link the use of the Book element with this namespace in some way, we would know that any reference to Book in a document that was linked with this namespace would refer to the usage as laid out in our PubCatalog.dtd.

Where a DTD dictates the entire structure of a document (and does so exclusively), a namespace is no more and no less than a resource from which we can draw just what definitions we need. Having said this, a namespace need not be a formal structural definition like a DTD, and the limited scope of this definition makes namespaces broadly applicable in XML. If the namespace is a DTD or schema, the definitions we use must remain consistent with the structure and syntax specified therein. We are free, however, to use just those names that we need or desire, and use a namespace as a way of distinguishing between the uses of an element.

So, in order to use the namespaces effectively in a document that combines elements from different sources, we need two parts:

A reference to the URI that defines the use of the element

An alias that we can use to identify which namespace our element is taken from, this takes the form of a prefix for the element (for example <catalog:Book> where catalog is the alias for the ambiguous Bookelement.)

Using and Declaring Namespaces

Having seen the advantages that Namespaces offer us in XML, we need to look in more detail at how we actually use them. We will start by looking at how we declare the namespace in a document, and then look at how we can use the namespace within the document, ending up with some examples.

Generally speaking, simple descriptive properties are often modeled as attributes and that is in fact how namespaces are declared in XML. There are a few twists and turns, however, so we'll proceed step by step to learn about what we can specify when we declare a namespace in an XML document.

Declaring a Namespace

If everyone is going to recognize a namespace declaration when they see one, we'll need a reserved word for them. The Namespaces Recommendation gives us xmlns. The value of the attribute is the URI that uniquely defines the namespace in use. This URI is often a URL pointing to a DTD, but it doesn't have to be. A URI, managed in such a way as to uniquely differentiate the namespace, is sufficient. Here are some simple namespace declarations:

xmlns="http://www.wrox.com/bookdefs/book.dtd"

xmlns="urn:wrox-publishing-orderdefs"

The nomenclature surrounding Web resources can be confusing. A Uniform Resource Identifier (URI) is a unique name for some resource. A Uniform Resource Locator (URL) locates the resource in terms of an access protocol and network location. This first example is a URL because it allows a browser to retrieve a resource from a particular location using HTTP. The second example names the resource but provides no location. The literal urn derives from an effort to develop permanent URIs.

Since one of our prime motivations for using namespaces was to be able to mix names from different sources, it might be useful for you to be able to provide an alias you could use throughout a document that would refer to the declaration. You do this by appending a colon and your alias to the xmlns attribute. Thus, the examples above become:

xmlns:catalog="http://www.wrox.com/bookdefs/PubCatalog.dtd"

xmlns:order="urn:wrox-publishing-sales-orderdefs"

Here the prefix catalog will refer to elements from the PubCatalog.dtd, while order will refer to elements declared in the order.dtd. After these declarations appear, we can just use book to refer to the first namespace declaration, and order to refer to the other one (without the URI). How we use these declarations and their aliases lets us provide even more information.

Here are the parts that make up a namespace declaration:

Qualified Names

It does us no good to declare a namespace if we can't tie it to a specific name we want to use. This is done through the use of qualified names. This is just what you might suppose it to be - a name qualified by the namespace from which it is drawn. You create a qualified name by taking the alias, known properly as a namespace prefix, and tack it on to the beginning of the name. Going back to the question of including a Book element in both catalog and ordering DTDs, assume that we declare a catalog namespace with the prefix catalog like so

xmlns:catalog="http://www.wrox.com/bookdefs/PubCatalog.dtd"

we can now use the prefix catalog to make it clear which namespace the element came from. So,

<catalog:Book />

would tell us that the name Book comes from the catalog namespace declaration. There could be a name Book in the order namespace as well, yet this qualified name avoids the possibility of ambiguity or collision. The name Title is unambiguously qualified as coming from a particular namespace. The namespace prefix is often referred to simply as the prefix, and the name itself is the base name.

Qualified names can apply to both element and attribute names. Here's an example that mixes some namespaces:

<catalog:Book order:ISBN="1-861003-11-0">

The element <Book> is drawn from the first namespace we saw above, while the attribute ISBN is drawn from the order namespace.

Scope

Namespace declarations have scope in the same way that variable declarations do in programming languages. This is important because it is not always the case that namespaces are declared at the beginning of XML documents, they can be included within a later section of the document. A namespace declaration therefore applies to the element in which the declaration appears, as well as children of that element even if it is not explicitly specified in the element. A name can refer to a namespace only if it is used within the scope of the namespace declaration.

However, we will also need to mix namespaces where elements would otherwise inherit the scope of a namespace, so there are two ways in which scope can be declared, default and qualified.

Default

As you might suspect, it could quickly get tiresome to have to add a prefix to every name in a document. In fact, by introducing the concept of name scope to our tool set, we can dispense with a lot of prefixes. If we define a default namespace, all unqualified names within the scope of the declaration are presumed to belong to that default. So, if you declare a default namespace in the root element, it is treated as the default namespace for the whole document, and can only be overridden by a more specific namespace declared within the document.

We declare a namespace to be the default for some scope by omitting the prefix declaration.

Here's how you might use this to embed some HTML within an XML document marked up according to a DTD designed for book content, called BookContent.dtd:

<Chapter xmlns="http://www.wrox.com/bookdefs/BookContent.dtd">

   <Title number="7">Namespaces and Schemas</Title>

   <Author>I. M. Named</Author>

   <Content>

      <Paragraph>

         Let's have a table:

         <table xmlns="http://www.w3.org/TR/REC/REC-html40">

            <tr>

               <td>A tisket</td><td>A tasket</td>

            </tr>

            <tr>

               <td>One fish</td><td>Two fish</td>

            </tr>

         </table>

       </Paragraph>

      <Paragraph>This is a very short paragraph</Paragraph>

   </Content>

</Chapter>

The elements <Title>, <Author>, <Content>, and <Paragraph> and the attribute number come from the default namespace defined in the <Chapter> element. Within the Chapter element, however, you can see the table element and its children - tr and td. These belong to the HTML namespace declared in the table element. Note that the scope of the HTML namespace declaration in this example ends when the table element closes. The second occurrence of Paragraph does not come from the HTML namespace.

When a prefix is declared and then used with a name, the namespace is explicitly stated. For an unqualified name to be reconciled to a namespace, a default namespace must have been declared with a scope that includes the unqualified name (without the prefix).

Qualified

All this is well and good if you can clearly separate your namespaces. But sometimes, you'll want to sprinkle names from foreign namespaces throughout a document. You need a finer degree of granularity. Rather than declaring namespaces all over the place, you can make use of qualified names. Declare the namespaces you will need at the beginning of the document and then qualify them at the point of use.

<Measurements xmlns="urn:mydecs-science-measurements"

  xmlns:units="urn:mydecs-science-unitsofmeasure"

  xmlns:prop="urn:mydecs-science-thingsmeasured">

  <OutsideAir units:units="Fahrenheit">86</OutsideAir>

  <FuelTank>

    <prop:Volume units:units="liters">120</prop:Volume>

   <prop:Temperature units:units="Celsius">20</prop:Temperature>

  </FuelTank>

</Measurements>

In the root element, Measurements, I've declared three namespaces. The default takes care of the elements <OutsideAir>, <FuelTank>, and <Measurements>. However, I need to qualify some readings with units of measure, which I've done with the units namespace and the attribute units:units drawn from that namespace. Being able to qualify that name is very useful as this attribute pops up throughout the document. Finally, I needed to differentiate between some types of measurements, prop:Volume and prop:Temperature. Although I could have declared the prop namespace in the <FuelTank> element, I am free to use this namespace repeatedly (perhaps in a longer document) by declaring the namespace at the beginning and using qualified names.

Take a closer look at the namespace declarations and compare it to the namespace declaration in the <Chapter> element of the preceding section. That declaration was tied to a DTD, potentially making it possible to validate the names used against the DTD. In this example, we have unique names, but no DTD URL. Namespaces exist primarily to organize names into distinct sets and avoid name collisions. The W3C Namespace Recommendation says nothing about their use in validation. Indeed, the XML 1.0 Recommendation says nothing whatsoever about namespaces. The XML Schema effort (which we meet later) does more, but any current use of namespaces for validation will strictly remain an artifact of an individual parser's implementation until XML Schemas are an official W3C recommendation.

©1999 Wrox Press Limited, US and UK.

Page 2 of 5

 

Previous Page Table Of Contents Next Page
 

Recent Jobs

A great opportunity to Digital Vide
here is a greate opportunity as a S
A great opportunity as a Network En
A Greate Opportunituy as a SQL Deve
An immediate job opportunity as a B

View all Jobs (Add yours)
View all CV (Add yours)



Information Online

swimming pool builder
chicago web site design
spfxmasks
Cheap Web Hosting
conference calling
Prada sunglasses
answering service


    Email TopXML  

Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP