|
|
|
|
|
|
| |
|
Page 2 of 2
|
 |
XSD Schema Tricks and Tips
W3C Schema Definition Language
As I write this, the W3C XML Schema Definition
Language (XSD) has just recently gone into Candidate Recommendation
Status. The XSD Standard is in fact two different documents, along
with a "tutorial" document. These are listed below, and will all be
covered in greater depth subsequently.
- XML Schemas Part 1: Structures - This document provides
the grammar for the language, defining the notion of a schema in an
abstract fashion then discussing specific implementation
details.
- XML Schemas Part 2: Datatypes - This document gives the
datatype primitives that are used within schemas, as well as
covering some of the constraining rules and subtypes, such as
regular expression.
- XML Schemas Part 0: Primer - This 'primer' is not for
the faint of heart. As a measure of the complexity of schemas, even
the primer can be difficult to follow.
XSD Structures
XML Schema Structures form the meat of the schema
specification. Contained within its pages are the keys to
describing the structure of documents, the distinction between
Simple and Complex element types, Abstraction and Inheritance, and
considerably more. The document itself is divided into the
following:
- Conceptual Framework - This describes the principles
used to abstract the process of defining schemas. It is useful from
a terms definition standpoint, but it doesn't explicitly define the
XML implementation.
- Schema Components - Not there yet. Schema components
describe each of the elements and attributes that are used in
creating formal schemas from a strictly formalistic
standpoint.
- XML Implementation - This part contains the actual
implementation details, covering the XML representation of the
formal grammar.
- Constraints - This section deals with ways that a given
element or attribute can be constrained to work within a subset of
its original domain.
- Access and Composition - This section covers how schemas
work across a distributed system, including the thorny issues of
namespace design and schema location and retrieval.
- Validation - This section looks at validation mechanisms
and how they should be addressed by the parser vendors.
- Location is http://www.w3.org/TR/xmlschema-1/
XSD Datatypes
The Datatypes document is simpler, dealing as it
does with the specific implementation of type rather than
conceptual frameworks.
- Datatype Concepts - This section looks at data types in
an abstract fashion, setting up the distinctions between differing
archetypes of data, and introducing the notions of value and
lexical spaces.
- Built-in Datatypes - This section lists the various core
datatypes, including those for string, numeric, and date
representation.
- Datatype Components - This section deals primarily with
constraints that limit the scope of given datatypes, and includes
both the use of patterns (i.e., regular expressions), enumerations,
precision, and minima and maxima constraints.
- Location is http://www.w3.org/TR/xmlschema-2/
XSD Primer
In order to understand the XSD specifications, read
the Primer. It can be a little cryptic, but Document 0 is still
probably the best place to see Schema code in action. Just a few
notes.
- Four Separate Examples - The primer looks at four
different examples of schemas defined within the language, working
its way up in complexity from specifying simple documents to
creating abstract types and inheritance.
- Location is http://www.w3.org/TR/xmlschema-0/
Built-In Primitive Datatypes
XSD contains a number of built-in datatypes. In
some cases these types are primitives - they are not made of any
other types - while others are derived from more primitive types
(or from derived types that are themselves built from primitives).
The following primitive types are far from inclusive, but represent
most of the more common types.
|
string |
A sequence
of Unicode characters. |
"This is a
sample string. ????' |
no
|
|
boolean |
One of of
either true (1), or false (0). |
true |
no
|
|
float |
A single
precision 32-bit floating point type |
-1E4,
2442, 342.34, 0, INF, NaN |
no
|
|
double |
A double
precision 64-bit floating point type |
-1E4,
2442, 342.34, 0, INF, NaN |
no
|
|
decimal |
A decimal
number of arbitrary precision |
3.141582653589793238462643383279 50288419716939937510...
|
no
|
|
timeDuration |
A specific
period of time, in the format P nY nM nD T nH nM nS. Only relevent
duration need be shown. |
P1Y2M13DT4H represents one year, two months, thirteen days and four
hours. |
no
|
| URI
|
A
Universal Resource Locator |
http://www.topxml.com/cagle |
no
|
Built-in Derived Datatypes
Derived types were added by the W3C to cut down on
developers rolling their own for many of the more common data
formats such as integer or time.
|
integer |
A decimal
value in which the scale (the number of digits after the decimal
point) is 0. |
..., -2,
-1, 0, 1, 2, ... |
from
decimal |
|
nonPositiveInteger |
All
integers less than or equal to 0 |
..., -3,
-2, -1, 0 |
from
decimal |
|
long |
Value
derived from integer within ±9223372036854775808 |
2214433234, 12, -32551 |
from
decimal |
| int
|
Value
derived from long within ±2147483648 |
32768, 12,
-32551 |
from
long |
|
short |
Value
derived from int within ±32786 |
32765, 12,
-32551 |
from
int |
|
byte |
Value
derived from short within ±128 |
78, 12,
-114 |
from
short |
|
unsignedInt |
Value
derived from unsignedLong within 0 to 4294867286 |
3248321,52,-215534 |
unsignedLong |
|
time |
Time
represents an instant of time that recurs every day, and is given
in the format HH:MM:SS-ZZ:YY, where ZZ:YY represents the time zone
offset relative to Greenwich Mean Time. |
21:15:00-08:00, which is 9:15 at night in Seattle (8 hours from
Greenwich Mean Time) |
from
recurringDuration |
|
timeInstant |
timeInstant combines date and time format, with the time separated
from the date by a "T". |
2000-08-15T 21:15:00-08:00 is August 15, 2000 at 9:15 PM in
Seattle. |
from
recurringDuration |
DTD Specific Datatypes
In addition to string, numeric, and date types, the
schema reference also covers more traditional XML types that
pertain to atomic units - name tokens, entities, and so forth.
These are largely included for backward compatibility with DTDs.
| ID
|
A unique
name token that identifies a given element. |
id="a1924" |
no
|
|
IDREF |
A
reference to an existing ID for a given element. |
idref="a1924" |
no
|
|
NMTOKEN |
A
collection of alphanumeric characters and the underscore character,
used within attributes. |
the value
'red' in the attribute color="red" |
no
|
|
NMTOKENS |
A list of
NMTOKEN items, typically as options within a given attribute,
separated by white space. |
the value
'red blue green' in the attribute colors="red blue green"
|
from
NMTOKEN |
|
ENTITY |
A
reference to a specific entity object defined within a DTD.
|
&myDocument; |
no
|
Creating A Simple Type By Constraint
Simple types form the foundation of datatypes, and
are created in one of two ways. Either they are constrained from
existing data-types, or they are aggregated from simpler data
types. As an example of the former, consider a phoneNumber type,
which is essentially a string that follows a very specific order
(including potential optional characters), a purchase order
quantity amount (limited to 1000 units), and an enumeration of
specific holidays.
<simpleType name="phoneNumber" base="string">
<pattern value="(?\d{3})?-?\d{3}-?\d{4}"/>
</simpleType>
<simpleType name="poQuantity" base="int">
<minInclusive value="0"/>
<maxInclusive value="1000"/>
</simpleType>
<simpleType name="holiday" base="recurringDate">
<annotation>
<documentation>Holiday Values
<enumeration value="--01-01"/>
<enumeration value="--07-04"/>
<enumeration value="--10-31"/>
<enumeration value="--12-25"/>
</simpleType&l\gt;
Creating A Simple Type By List
You can also create a simple type by aggregating a
list of other simple types. The following, for instance, will
create aggregates of the phoneNumber, poQuantity types.
<simpleType name="phoneNumbers" base="phoneNumber" derivedBy="list"/>
<simpleType name="poQuantities" base="poQuantity" derivedBy="list">
<minLength value="0"/>
<maxLength value="3">
<annotation>
<documentation>No more than three poQuantities can be given</documentation>
</annotation>
</maxLength>
</simpleType>
A Basic Schema
A schema consists of a collection of <element> tags, along with a collection of
<simpleType> and
<complexType> elements that aggregate other
elements into a cohesive blocks. In the simplest cases, the
arrangement of elements are straightforward, mapping easily to a
known schema instance. For example, the schema for the slideshow
that you are currently viewing is quite simple (at least at first
blush) and is illustrated below.
<schema xmlns="http://www.w3.org/1999/XMLSchema">
<element name="slides" type="slideList"/>
<complexType name="slideList">
<element name="head" type="slideListHead"/>
<element name="group" type="slideGroup" minOccurs="0" maxOccurs="unbounded"/>
</complexType>
<complexType name="slideListHead">
<element name="title" type="string"/>
<element name="author" type="string"/>
<element name="date" type="date"/>
</complexType>
Continued ...
A Basic Schema (Continued)
This in turn contains the group and slide
information.Continued ...
<complexType name="slideGroup">
<element name="title" type="string"/>
<element name="slide" type="Slide" minOccurs="0" maxOccurs="unbounded"/>
</complexType>
<complexType name="Slide">
<element name="title" type="string"/>
<element name="para" type="string" minOccurs="0"/>
<element name="point" minOccurs="0">
<complexType content="mixed">
<element name="key" type="string" minOccurs="0" maxOccurs="1"/>
</complexType>
</element>
<element name="code" type="string" minOccurs="0"/>
</complexType>
</schema>
Complex Types
Simple Types define atomic characteristics, and
constrain an existing type. Complex Types, on the other hand,
aggregate collections of elements together into a single cohesive
unit. For example, a Slide element pulls in a number of string
based elements into the contents of a primary Slide object, which
can in turn be referenced in a type attribute by some other schema
element definition.
<complexType name="Slide">
<attribute name="id" type="ID"/>
<element name="title" type="string"/>
<element name="para" type="string" minOccurs="0"/>
<element name="point" minOccurs="0">
<complexType content="mixed">
<element name="key" type="string" minOccurs="0" maxOccurs="1"/>
</complexType>
</element>
<element name="code" type="string" minOccurs="0"/>
</complexType>
Mixed Content
Mixed content, elements containing both nodes and
text, can be especially difficult to specify. The
content="mixed" attribute indicates that the enclosed elements
will likely occur in the company of one or more unspecified text
nodes.
<complexType content="mixed">
<element name="key" type="string" minOccurs="0" maxOccurs="1"/>
</complexType>
Attributes
You can define an attribute within an XSD schema
using the <attribute> tag, analogous to the way that
elements are defined. For example, to create an id attribute
associated with the Slide element, you would include the tag:
<complexType name="Slide">
<attribute name="id" type="ID"/>
Note that you cannot create an attribute using a complex type,
although an attribute from a simple type is quite
permissible.
Fixed Types and Default Values
- You can assign a specific value to a given element
or attribute. Based upon the XSD use attribute, the role
that the value has can be determined:
- fixed, in which the element or attribute will always
have that value (and as such can typically be excluded)
-
<attribute name="output" use="fixed" value="text/xml" type="string"/>
- default, where the value of the attribute or element is
automatically set to the default value if the element is not
explicitly specified,
-
<attribute name="country" use="default" value="United States" type="string"/>
- optional, in which the attribute or element does not
need to be explicitly specified within a calling element,
-
<attribute name="street2" use="optional" type="string"/>
- required, the element or attribute must be included
within the parent element.
-
<attribute name="name" use="required" type="string"/>
- prohibited, the element or attribute cannot be included
in the parent element.
-
<attribute name="data" use="prohibited" type="string"/>
Boundaries
- You can also specify the minimum and maximum number
of occurences of a given element (by definition XML cannot have
more than one attribute of the same name).
- maxOccurs gives the maximum number of times a given
element can appear, and ranges from 0 (only for use="prohibited")
to any integer value, or the text value "unbounded" (no limitation
on maximum number of elements).
- minOccurs gives the minimum number of times a given
element can appear, and can take on the values 0, 1, or any value
less than or equal to maxOccurs in the same section.
Anonymous Types
In certain cases, a type is only specifically
needed once within a complex data type definition, and as such
there is no explicit need to create a formally named element type.
In such cases the type element is said to be an anonymous type.
Anonymous types are otherwise identical to their non-anonymous
counterparts:
<xsd:complexType name="Items">
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:element name="productName" type="string"/>
<xsd:element name="quantity">
<xsd:simpleType base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:simpleType>
</xsd:element>
<xsd:element name="price" type="decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="date" minOccurs='0'/>
<xsd:attribute name="partNum" type="Sku"/>
</xsd:complexType>
</xsd:element>
</xsd:complexType>
Annotations
- Schemas describe an object structure, but that
description goes beyond simply providing encapsulation and data
type information. Annotations let you provide additional
information about the element or attribute in question:
- Application Information (appInfo), can be used to pass
specific information about the element. For example, information
about how a schema to instance generator might render a given
element would be contained here.
- Documentation (documentation) provides a way of
describing the element in human readable terms, and could in turn
contain specific information in different languages.
- Labels. Either
<documentation> or
<appInfo> can be used to provide labels for
tables and other interface elements. This can pull the onus of
generating output labels from XSLT or DOM into the schema
itself.
Grouping and Ordering
- XSD varies from earlier schema models in that the
assumed model is one where given elements appear in the sequence
specified. However, there are in fact three models for containing
data:
- sequence. This, the default model, requires that all
elements are presented in the order given.
-
<group name="shipAndBill">
<sequence>
<element name="shipTo" type="Address" />
<element name="billTo" type="Address" />
</sequence>
</group>
- choice. The choice model presents the elements as
potential options that are mutually exclusive - only one element
among the choices given can be a child of the indicated parent
element.
-
<group name="addressInfo">
<choice>
<element name="address" type="AddressUS" />
<element name="address" type="AddressUK" />
<element name="address" type="AddressDE" />
</choice>
</group>
Grouping and Ordering II
Attribute Groups
XSD lets you define a group of related attributes
as a single entity that can be referenced by an element. For
example, suppose that in a catalog of books of different genres you
still had a number of common attributes (cover image (href), list
price (price), and ISBN number (isbn). You could define an
attribute set that would pass recreate this information, as
follows:
<attributeGroup name="bookInfo">
<attribute name="href" type="URI"/>
<attribute name="price" type="float" pattern="\d*(.\d{2})+"/>
<attribute name="isbn" type="string" pattern="\d{10}"/>
</attributeGroup>
<complexType name="book">
<attributeGroup ref="bookInfo"/>
</complexType>
Advanced Features
- This is sufficient to create basic XSD schemas,
though the specification itself is considerably more complex. The
XSD specification also permits the following:
- Include. You can break a schema into a set of smaller
component schemas.
- Class Inheritance. It is possible to create a type that
derives from other types in a very straightforward manner. This
makes inheritance possible.
- Equivalence Classes makes it possible to create
polymorphic classes, where the same general interface can have
separate implementations based upon internal data.
- Abstract Elements provide a jumping off point for the
implementation of inheritance within XML terms.
- Preventing Derivations - what this does is to guarantee
that abstract types cannot be inherited.
Object Oriented XML
- One of the most immediate impacts that Schemas will
have upon XML-based programming is that it mixes the paradigms of
the declarative structures that are characteristic of XML with an
object-orientation that makes it possible to deal with identity and
conceivably uniqueness in the programming sphere.
Specifically,
- Distributed Objects. With the combination of XML
listings of data code and either embedded or referenced schema
objects, it becomes easier to maintain the integrity of a given XML
"object" across a broad network.
- Primitives Repositories. If you look upon XML as a means
of creating a model that describes any object in the virtual
sphere, then the schema enables the concept of primitives libraries
that could reside anywhere on the net.
- Instance Generation. An XSD schema, with its associated
annotation resources, could very easily generate through XSLT
instances of objects without explicit need for customized
constructor code. This could also be used for generating forms and
similar resources that depend upon the data types of the schema to
determine the valid content of each field.
Will There Ever Be Universal Schemas?
- One of the greatest conundrums with schemas is that
it is so easy to create a schema, and so hard to get anyone else to
use it. The babel of schemas that currently exist, while
bewildering, speaks to some of the strengths of schema in
general.
- Schemas as Contracts. While a technical specification,
the characteristics of schemas will make them likely to be part of
future legal documents in contracts - defining the data
characteristics of the common data interchange.
- Schemas and XSLT. A comprehensive Schema definition
language can be used to better clarify XSLT transformations, moving
it from a text manipulation language to a much more complex data
manipulation language.
- Schema Variants. The XSD specification contains a number
of provisions for building conditional Schemas that change in
response to the internal data within the XML document the schema
represents. By providing a rigorous mechanism for doing so, XSD
makes it possible to have multiple "similar" transformations that
provide some flexibility into the B2B sphere.
- Universal IDL. Interface Description Languages, or IDLs,
are ways of describing programmatic interfaces. It is likely that
an object oriented XML schema would be a major component of a
universal IDL, perhaps one mediated by SOAP.
|
Page 2 of 2
|
|
|
|
|
|
|
|
|
|
|
|