Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
Modern XML technology rests on three distinct legs: the XML specification itself, XSLT/XPath, and XML Schema. Schemas are significant not only in defining XML structures but also in providing data type capabilities to XML, adding a measure of object oriented programming support, and giving an infrastructure that can be used to support internationalization and personalization of presentation. This session looks at a number of different techniques that can be used with XML Schema for enhancing XML in your organization.
A Schema is an abstract definition of an object's
characteristics and interrelationships. This schema is represented
in different ways in different environments.
Databases - A database schema describes the table names
and columns, describes the relationships between tables (via keys),
and acts as a repository for triggers and stored procedures.
Classes - An object oriented class interface is a schema
for describing objects, including properties, methods, and events
acting on the object.
UML - A UML object is in fact an abstraction of a schema
by way of a visual metaphor. UML objects can be used to generate
other types of schemas, so in a sense a UML object is a
meta-schema.
XML - An XML Schema describes the interrelationship
between elements and attributes that make up a given XML "object".
Schemas can include data type representations, but don't
necessarily have to.
What is a Schema Not?
It's worth contrasting what schemas are with what
they aren't. Specifically, a schema is not ...
An Object - While it is possible as a special case to
make a schema that describes other schemas (think about XML a bit
in this role), in general a schema is an abstraction - it contains
no actual data, only potential data.
A Form - A schema should be independent of the media
used to display the information in the schema. While it is possible
(and desirable) to annotate your schemas, keep in mind that this
annotation shouldn't be tied into a specific implementation.
A Transformation - Nope, that's XSLT ... different
seminar.
XML Schema Languages
There are currently a number of different schema
language representations, although by the time of this conference
there will probably be one fully approved XSL Schema Definition,
which this discussion explores. These other schema languages
include:
Document Type Definitions(DTD) - The principle schema
mechanism for SGML, DTDs are still the most common form of XML
schema representation. However, they suffer from a number of
significant limitations, and are being phased out of the XML
Oevure.
XML Data Reduced(XDR) - One of the first schema
representations for XML, this was proposed in early 1998 by
Microsoft, and forms the foundation of almost all of their XML data
type story. XDR included the introduction of data types, but didn't
include mechanisms for creating OOP representations (or generating
more sophisticated datatypes).
Simple Object XML(SOX) - Forming the foundation of the
ebXML movement, the SOX schema architecture introduced a more
sophisticated form of object oriented design, including the notion
of inheritance and archetypes.
Why DTDs Are Not Enough
When XML was first designed, the SGML community
assumed automatically that the SGML document definition language,
DTD, would be sufficient to describe XML. However, DTDs suffer from
a number of limitations that make them less than ideal as data
language grammars.
Document Centric - DTDs are designed for the
manipulation of text blocks for the purpose of creating documents.
XML is increasingly being called upon to handle the transport of
data, not documents, and the largely macro driven facitilies of
DTDs are simply not up to the task.
No Data Types - There is no way using a DTD to
differentiate between the string "32.421" and the number 32.421. As
a consequence, applications that use XML must have an existing
convention for denoting which fields are numbers vs. strings.
Moreover, there are no constrains to limit data types to insure,
for example, that a given string consists of 6 digits and two
letters.
Not in XML - A DTD is written in its own language. What
this means is that validating parsers must be written to understand
DTDs using separate capabilities to those they use for manipulating
XML. With a schema written in XML, you can query the schema for
more detailed information from an XML expression, something you
couldn't do with a Schema.
Why DTDs Are Not Enough (More Reasons)
Entities - An entity (denoted by the
&entityRef; notation) are ambiguous, especially in
asynchronous environments. Entities can prove troublesome for
creating markup code, require that entity references be resolved
prior to the processing of the XML document itself, and are very
difficult to manipulate in XSLT.
Parametric Entities - Parametric Entities provide a
mechanism for changing an XML document based upon some external
parameter. However, this essentially forces the XML document to be
responsible for its own transformations, something that can be
accomplished far more easily with XSLT.
Notations - A notation maps a given element to a media
application for processing that element. While useful in theory,
notations frequently end up producing too tight a coupling between
the data of an XML document and the corresponding implementation of
that data in a given system.
No Mechanism for Inheritance - A DTD contains no direct
mechanism to handle the concept of one schema inheriting some or
all the characteristics of another. This tends to keep DTDs
constrained to the level of individual object definitions, rather
than on larger frameworks.
Who Needs Schemas?
While these constraints provide a way of seeing
what an XML schema definition language shouldn't do, the formal
requirements for creating a schema language come from a number of
different types of users, and understanding these conflicting needs
can help to explain what the requirements of such a language are.
Document Users and Vendors - The original XMLers, these
users need a language that is sufficiently flexible to describe and
contain sophisticated documents. They are mostly happy with the
features of DTDs, and see the additional requirements as
cumbersome.
Database Vendors and Developers - On the opposite end of
the spectrum, most traditional SQL database vendors have realized
that XML can be a useful mechanism for transmitting large amounts
of data over the wire without forcing highly specialized clients.
Their primary requirement is for a stable set of simple
datatypes.
e-Commerce Developers and Vendors - These users have a
need for flexibility, but also want ways of insuring that there is
as little work in converting between schemas as possible. As a
consequence, they are the ones pushing for stronger inheritance
models and other object oriented features.
User Agent Vendors - These are the manufacturers of
browsers and other XML enabled devices. They want a schema language
that can be used for specifying languages that best utilize the
characteristics of user agents.
Requirements for a Schema Language Definition
These often contradictary stances have made it
difficult to form consensus on the "perfect" schema language.
However, over time, the very tangible need for an XML based schema
has forced a minimal set of requirements for such a language.
Flexible Data Type Specification - A schema language
should include the ability to both define a set of core data types
and provide an extension mechanism for creating new types from
these.
Containment, Grouping, and Ordering - Any schema
language should describe the relationship between the elements and
attributes, including ordering, mutual exclusivity, grouping, and
defaults.
Object Oriented Design - The language should incorporate
many of the features of traditional object oriented programming
languages. At the very least, there should be a mechanism to
accomodate inheritance.
Text Friendly - A schema language should have flexible
ways of dealing with compound documents, open element and attribute
sets, and differentiation between two or more objects within the
same XML document.
Validation - One of the key roles of any schema language
is to insure that an XML document is not only well formed (it can
be parsed), but valid (everything in the document is of the right
type, in the right place, and with the right numbers). Validation
mechanisms by constraint (minima and maxima) and text patterns
(regular expressions) are also desireable.
Advantages of an XML Based Schema Language
Beyond the lack of the disadvantages that plague
DTDs, XML based schemas offer a number of potential benefits as
well.
Dynamic Schemas - Especially with such items as
enumerations, validation data isn't always available at
design-time. While DTDs also include a macro-like validation
mechanism, generating DTDs dynamically requires much more
customized code.
XSLT Manipulation - XML based schemas can be queried and
manipulated by XSLT. This means that it becomes possible to use
XSLT to generate an instance of data from its associated schema in
a very generic fashion, especially when parameterization is
employed.
Self-Documentation - By moving documentation about a
given schema into the schema itself (through the use of
annotations), you can simplify the documentation about an XML
structure, and even use the documentation from schemas to define
column labels and other tags in interface devices.
Auto-Schemas - With XSLT and a good parser you can
convert an XML document into a first pass schema (within
limits).
Interface Definition Languages - One area where XML
schemas are gaining attention is in the realm of interface
definition languages, where an XML structure describes the
interface of a programmatic class. As XML technologies such as SOAP
become more prominent, this makes discovery of computer application
classes possible.