BizTalk Utilities CV ,   Jobs ,   Code library  
 
Home Page
Schemas, xsd, xdr
RDF Schemas
XSD Schema Summary
Map Schemas using Wildcards
Map Schemas using Model Groups
XSD Schema Mappings
General XSD Limitations
Extracting Schemas for Nested Classes
Creating Schemas from Assemblies
Strongly Typed .NET DataSet
XSD schemaLocation
Selecting Elements in Schemas
Specifying Namespaces
XSD targetNamespace
XSD Schemas
Convert XDR to XSD
Learn XSD Schemas .NET Tutorial
Conversion of genealogical files to XML with a DTD
AccessXML
Sorting by date without using a schema
Validating and enforcing a list of attribute values
<< reBlogger
SEO >>

By :Mark Wilson
I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :07/23/2001
Times viewed :1420

 

 

  

W3C XML Schemas: Rules for Documents and Data

by Simon St.Laurent

PLEASE NOTE:  This article is out of date and is provided for those still using older schemas.  For information regarding the current XSD Schemas go here.

XML provides a syntactical foundation for creating labeled document structures in all kinds of styles and flavors.  XML 1.0 came with a set of tools for describing those structures, Document Type Definitions (DTDs), but that set of tools both used its own syntax and didn't address the needs of data-centric fields into which XML quickly advanced.  A new proposal, for XML Schemas, promises to use XML syntax, more precise data typing, and a mostly-object-oriented approach to describing structured types.

Background and Current Situation

Developers have been pushing for XML Schemas since before the XML Recommendation was issued.  Microsoft submitted XML-Data to the W3C, which published it as a Note a month before issuing XML.  After various other groups had submitted proposals, the W3C started the XML Schema Working Group and charged it with creating a Schema language that would unite the many schema dialects and reflect the collective potential of using XML syntax to describe XML document structures.

The XML Schema Working Group has knitted together the Schema for Object-Oriented XML (SOX), XML-Data, Document Content Description (DCD), and Document Description Markup Language (DDML, formerly XSchema) into two specifications, XML Schema Structures and XML Schema Datatypes.  (They also provide an introductory Primer.)  Over the past two years, they have developed and released various drafts, reaching Last Call in March 2000.

Over that two years, however, new competitors have emerged - Schematron, RELAX, and Document Structure Description (DSD).  While Schematron, which uses XSLT transformations to produce human-readable information about validation, can co-exist easily with XML Schemas, the other two proposals are pretty much replacements.  RELAX in particular is a threat, as it provides an extensible core model that is simpler than that used by XML Schemas while remaining potentially as powerful.  Perhaps more importantly, RELAX will be submitted to the International Organization for Standardization (ISO), giving it backing from an organization operating on a higher level than the W3C.

XML Schemas, in their current form are not simple to learn or use.  While object-oriented developers find much that is familiar, there are aspects of XML Schemas that don't apply object-oriented design principles consistently.

What's in XML Schemas?

XML Schemas provide two sets of tools for describing types at different levels of a document.  XML Schemas: Structures provides tools for describing structures composed of XML elements and attributes, referencing data types but leaving their definition to XML Schemas: Datatypes.  The Datatypes specification provides tools for describing atomic data stored as textual content inside of XML elements or attributes.  (Datatypes can and do have internal structure, but not structure that requires additional support from element or attribute structures.)

The combination of these two documents makes it possible for developers (not necessarily programmers) to create common vocabularies and shared expectations about document content.  By providing a formal and standardized description, which can be used to automatically verify that documents conform or don't conform, schemas make it easier for organizations to establish communications on common foundations.

DTDs already provide this foundation, but only to a limited extent.  DTDs come with a limited number of document-oriented core datatypes, and while it is possible to extend DTDs using notations, that support is fairly obscure and not widely promoted by the W3C itself.  As XML has grown past its SGML roots, most of XML's user community has come to XML without an understanding of how these tools were used in SGML, and both support and usage have been limited.

Those techniques can be useful for developers who need to start with DTDs - the tools most readily available to day - and then move on to Schemas when they're cooked.  XML Authority, the first Schema-centered tool out of the gate, uses exactly this approach.  Notations can carry some of the burden while DTDs are in use, even if they only store information used in later transitions.  XML Authority has also taken the approach of supporting any and all legacy formats, from XML Data to DDML to ODBC database schemas to Java and COM objects and even COBOL copybooks.  The transitions aren't always seamless, but much of the information can be preserved and reused.

While DTDs defined types, they did so for only a limited number of types and in a limited number of ways.  (Notations provide more functionality of course, but have their own limitations.)  Unlike DTDs, Schemas start by creating tools for defining types, and treat issues like content model validation as a matter of applying those types to documents.  DTDs defined content models explicitly, while Schemas add an extra layer of abstraction.

Datatypes

The Datatypes specification is more approachable than the Structures specification for a simple reason: it comes with a lot more pre-cooked and ready-to-use material  For developers who don't want to mess with abstractions, the Schemas Datatype specification provides a wide variety of useful built-in data types that require no further intervention by developers to use them:

string

boolean

float

double

decimal

timeInstant

timeDuration

recurringInstant

binary

uri-reference

ID

IDREF

ENTITY

 

The Datatypes specification goes on to derive the types below from the built-in types above:

language

IDREFS

ENTITIES

NMTOKEN

NMTOKENS

Name

QName

NCName

Integer

non-positive

integer

negative-integer

long

int

short

byte

non-negative

integer

unsigned-long

unsigned-int

unsigned-short

unsigned-byte

positive-integer

date

time

NOTATION

The Datatypes specification offers developers several mechanisms for refining datatypes, using facets.  Facets come in two flavors - 'fundamental', which includes, Equal, Order, Bounds, Cardinality, and Numeric - and constraining, including length, minlength, maxlength, pattern  (using a regular expressions language), enumeration, maxInclusive, maxExclusive, precision, scale, encoding, and period.

Developers creating Schemas can extend or limit base types with these facets using XML-based syntax, as shown below:

<xsd:simpleType name="myStates" base="xsd:string">

  <xsd:enumeration value="NY"/>

  <xsd:enumeration value="VT"/>

  <xsd:enumeration value="MA"/>

  <xsd:enumeration value="CT"/>

  <xsd:enumeration value="PA"/>

  <xsd:enumeration value="OH"/>

</xsd:simpleType>

or:

<xsd:simpleType name="starRating" base="xsd:integer">

  <xsd:minInclusive value="1"/>

  <xsd:maxInclusive value="10"/>

</xsd:simpleType>

or:

<xsd:simpleType name="USZip" base="xsd:string">

  <xsd:pattern value="\d{5}-\d{4}"/>

</xsd:simpleType>

These are just a few of the simpler possibilities, but they provide a taste.  Some developers are concerned that while this gives Schema users great power, it may be too much for those developing software that processes Schemas.  We may see developers starting with the built-in types and extending their software as the need becomes clearer.  Similarly, precompiled validators that only check against one schema may be an option.

Structures

The Structures specification provides developers with a powerful set of tools for defining types in XML.  Unlike DTDs, where element types consisted of a name and one layer of content possibilities, Schemas allow you to define complex types with multiple layers of content defined within a single type.  The example below shows a complex type defining a container named 'book', which contains a title, a list of authors, and a price.  The 'list of authors' is the intriguing part, because it is a complex structure itself.

<xsd:complexType name="book">

  <xsd:element name="title" type="xsd:string"/>

  <xsd:element name="authors" type="Authors"/>

  <xsd:element name="price" type="xsd:decimal"/>

</xsd:complexType>

<xsd:complexType name="Authors">

  <xsd:elementName="author" minOccurs="1" maxOccurs="*">

    <xsd:complexType>

      <xsd:element name="givenName" type="xsd:string"/>

      <xsd:element name="familyName" type="xsd:string"/>

        <xsd:attribute name="xlink:href" type="xsd:uri-reference"/>

    </xsd:complexType>

  <xsd:elementName>

</xsd:complexType>

These types can be nested, reused, and even modified.  The 'book' type needs to be place in an element context someplace else in the schema, but it could be used to define elements like hardcover, paperback, audiotape, or large-print if those characteristic were deemed most appropriate for defining the element name.

By defining types, like those shown above, the Schemas specification makes it much easier to reuse parts of schemas across schemas or in entirely new schemas.  Schemas can be broken down into reusable fragments and treated as libraries much more easily than DTDs could.  Part of this advantage stems from the structural advantages of XML instance syntax, while some of it builds on namespace support.  This can produce some extremely unreadable schemas (much as OOP doesn't always produce readable code), but it can be handled automatically using schema processors that normalize (or 'flatten') schemas into more verbose but more directly readable forms.

Schemas also provide tools for extending and restricting types, as well as a tool for preventing such extension and restriction.  Extension is relatively simple, using a derivedBy attribute to identify the source type being extended, but restriction has gone through a variety of syntactical and structural changes, leaving it perhaps the least stable portion of the specification.  Preventing such change is easy - just use the final attribute.

Developers who need to create wide-open spaces for experimentation, user flexibility, or the great unknowns of schema development can also take advantage of the any element and its namespace attribute.  Unlike the ANY content model in DTDs, schemas allow you to specify that any content from particular namespaces is acceptable (or not), don't require that the elements in the open area have declarations at all, and generally provide a more flexible approach to unpredictable content.  It's still somewhat risky, of course, as your applications may have to contend with unexpected information, but that may be acceptable.

XML Schemas come with a few features that reach across type structures and return to document and database structures.  Schemas provide full support for the ID, IDREF, and IDREFs that DTDs provided, allowing documents to contain identifiers that are unique across the scope of the entire document.  XML Schemas add keys and keyrefs to this mix, allowing developers to create values which must be unique within a certain scope but which are not required to be unique across the entire document.  Finally, developers may also use the unique element, which uses XPath to identify document portions that must be unique, making it possible to specify uniqueness without having to modify the actual type declarations.

Where to Start, Where to Go

XML Schemas are not really here today.  Although toolsets are rapidly improving now that a set of Last Call drafts for Schemas has been released, the 200+ issues identified in the comments on those schemas promise some significant delays.  Continuing complaints about the readability and usability of schemas have surfaced on various mailing lists, and a recent survey (http://metalab.unc.edu/xql/tally.html) showed many developers to be unhappy with the current state of the documents, even to the point of accepting a slowdown.

One approach that might be reasonable is the use of a subset of Schemas.  Many developers have effectively done this with XML DTDs, using the simple parts (elements and attribute declarations) they understand, while leaving the stranger parts (entities, conditional sections, and notations) for the experts.  Unfortunately, the Schemas spec itself provides little guidance on this score, and much of the specification is tightly woven around types.  The only parts that are easily discarded are the uniqueness qualifiers and the notions of inheritance, but the remainder of the specification is nearly as difficult.  Subsetting datatypes is easier, as the specification provides built-in types that can do much of the work.

Longer term, the outlook remains somewhat cloudy.  Microsoft has announced plans to switch over to XML Schemas when they are complete, but a lot of legacy XML-Data Reduced will continue to lurk.  At the same time, RELAX is moving forward, and seems to be attracting many members of XML's document-oriented wing.  At this point, it's time to explore XML Schemas and learn about structures, but it might pay better to learn about document and data modeling in general than about the details of any particular schema language.

 

  

Rate this article on a scale of 1 to 10

Your vote :  


 

Recent Jobs

Integration Specialist Needed - Wor
Virtualization Server Infrastructur
A great opportunity to Digital Vide
here is a greate opportunity as a S
A great opportunity as a Network En

View all Jobs (Add yours)
View all CV (Add yours)




swimming pool builder
chicago web site design
spfxmasks
Cheap Web Hosting
conference calling
Versace sunglasses
answering service


    Email TopXML  

Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP