BizTalk Utilities CV ,   Jobs ,   Code library  
 
Home Page
XHTML
XHTML DTD Entity Examples
XHTML DTD Entity Reference
The XHTML <onunload> Event
The XHTML <onsubmit> Event
The XHTML <onselect> Event
The XHTML <onreset> Event
The XHTML <onmouseup> Event
The XHTML <onmouseover> Event
The XHTML <onmouseout> Event
The XHTML <onmousemove> Event
The XHTML <onmousedown> Event
The XHTML <onload> Event
The XHTML <onkeyup> Event
The XHTML <onkeypress> Event
The XHTML <onkeydown> Event
The XHTML <onfocus> Event
The XHTML <ondblclick> Event
The XHTML <onclick> Event
The XHTML <onchange> Event
The XHTML <onblur> Event
<< XALAN
XML DOM >>

By :Mark Wilson
I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :10/18/2001
Times viewed :431

 
  

HTML Comes of Age: XHTML

By Don Kiely
Copyright 2000 Don Kiely

Let's get the bad news out of the way right up front: Ladies and gentlemen, the wild and wooly days of the Web are over and done. Those of you who have learned to get away with all of the HTML tricks that fool browsers into doing your bidding are going to be very sad. But those of you who embrace XML and its demand for rigid adherence to structure will flourish in the New Web.

Two Great Tastes that Taste Great Together

HTML is a stogy octogenarian that helped fuel the massive assimilation of the Web into our daily lives but which is now holding the Web back from what it can truly become. It is inflexible, extended only through the messy process of Microsoft or Netscape (well, AOL now) adding a new tag then battling for market approval. It makes creative Web design difficult. Pages that bounce between script and HTML code in a single page make for a mind-numbing dance that is hard to maintain and debug.

In stark contrast, XML has little or nothing to do with formatting. It is all about meta data, data about data, which identifies what data is. So if I put the string 'Horatio' in HTML, you have no idea what that string is, except maybe through some complex context algorithm. But if I wrap that string in a pair of XML tags <FirstName> and </FirstName>, it becomes trivial to pluck that string out of the page and know exactly what to do with it. XML lets me define my own tags, create custom attributes that further describe the data, and makes it easy to move data across platforms that otherwise wouldn't have the time of day for each other.

The World Wide Web Consortium (W3C), the standards body that decides these things, has put XML and HTML together, taking the best of each and putting it into XHTML. Two great tastes that taste great together. The future of the Web will be founded on an extensible formatting markup language that is flexible, lets you create your own tags, and will make it far easier to design and develop true Web applications.

The promise of XHTML is that it will make Web sites more adaptable while supporting existing sites, as long as those existing sites are HTML 4.01-compliant.

XML is not the HTML-killer it was touted in its early days, but XHTML will most certainly kill off HTML. And it's about time.

Extensible HTML

The XHTML recommendation was published by the W3C on 26 January 2000, and refers to XHTML as "a bridge to the future." According to various versions of the W3C specification, XHTML offers three major advantages to Web site developers: extensibility, portability, and modularity. XHTML is extensible by adding new elements without altering the entire DTD (document type definition) that the document is based on.

With all the hype about the extensibility of XHTML, I was confused at first that the spec doesn't have much information in it about how to define your own tags. That's because XHTML isn't there yet. It is 'merely' a reformulation of HTML 4.01 in XML, so that you create a Web page in XML with references to one of three DTDs that I'll discuss below. The current XHTML recommendation is the first step in realizing the extensible dream of HTML.

The second major advantage is portability, sometimes referred to as interoperability. Most Internet access is through browsers on desktop computers, though more and different types of devices are constantly being introduced. Some of these devices, such as cell phones and household appliances, won't have the processing power of a desktop computer, and browsers on them will be less tolerant of malformed markup to render the document. XHTML is designed to make Web documents accessible and interoperable across platforms, in part by enforcing a rigorous coding standard.

Modularity made it into the specification late in the process, and will be fleshed out in XHTML 1.1. It acknowledges the growing role that the Web is playing in handheld devices. Browsers on these devices will not need all XHTML elements, so XHTML allows subsets of elements. This way the new language of the Web will be scalable both up and down, a critical feature for its success on the Web and on new wireless devices.

XHTML Syntax

The semantics of XHTML elements and their attributes are defined by the current HTML 4.01 Specification. XHTML 1.0 specifies three XML document types that correspond to the three DTDs specified in HTML 4.01: Strict, Transitional, and Frameset. These XHTML DTDs are more restrictive than HTML because XML is more restrictive in its syntax. Table 1 lists the three DTDs and the DOCTYPE tag used to specify each in a Web page.

Table 1: XHTML Document Type Definitions.

XHTML 1.0 Strict: Use when you're doing all of your formatting in Cascading Style Sheets (CSS), and not using <font> and <table> tags to control how the browser displays your documents.

        <!DOCTYPE html

                PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

                "DTD/xhtml1-strict.dtd">

XHTML 1.0 Transitional: Use when you need to use presentational markup in your document, so that you don't limit your audience to users with browsers that support CSS.

        <!DOCTYPE html

                PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

                "DTD/xhtml1-transitional.dtd">

XHTML 1.0 Frameset: Use when your documents have frames

        <!DOCTYPE html

                PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"

                "DTD/xhtml1-frameset.dtd">

The DOCTYPE tag doesn't affect the page by itself, it just tells the browser how to validate the XHTML code in the document.

A strictly conforming XHTML 1.0 document is restricted to tags and attributes from the XHTML 1.0 namespace. (The Strict DTD moniker shouldn't be confused with 'strictly conforming' documents. Strict DTDs specify a particular format of DTD in HTML 4.01, and strictly conforming means that it fully complies with the XHTML spec.) Such a document must meet some rather exacting requirements:

  • The document must validate against one of the three DTDs.
  • The root element of the document must be <html>.
  • The root element of the document must designate an XHTML 1.0 namespace using the xmlns attribute.
  • There must be a DOCTYPE declaration in the document prior to the root element. If present, the public identifier included in the DOCTYPE declaration must reference one of the three required DTDs.

The code in Figure 1, taken from the XHMTL proposed recommendation, is an example of a minimal XHTML 1.0 document:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html

PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

"DTD/xhtml1-strict.dtd">

<html xml:lang="en" lang="en">

<head>

<title>Virtual Library</title>

</head>

<body>

<p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>

</body>

</html>

Figure 1: A minimal XHTML 1.0 document, based on the Strict DTD

The spec requires that a strictly conforming document specify the XHTML namespace using the xmlns attribute, defined to be http://www.w3.org/1999/xhtml. Figure 2 shows how the XHTML namespace can be used with another namespace, and Figure 3 shows how the XHTML 1.0 namespace can be incorporated into another XML namespace. Both these examples are from the XHTML specification. The implications of this kind of flexibility are enormous, letting you build Web documents that take advantage of various features of different namespaces.

<html xml:lang="en" lang="en">

  <head>

    <title>A Math Example</title>

  </head>

  <body>

    <p>The following is MathML markup:</p>

    <math xmlns="http://www.w3.org/1998/Math/MathML">

      <apply> <log/>

        <logbase>

          <cn> 3 </cn>

        </logbase>

        <ci> x </ci>

      </apply>

    </math>

  </body>

</html>

Figure 2: Example of using the XHTML 1.0 namespace with another namespace, in this case the MathXL namespace.

<?xml version="1.0" encoding="UTF-8"?>

<!-- initially, the default namespace is "books" -->

<book xmlns='urn:loc.gov:books'

    xmlns:isbn='urn:ISBN:0-395-36341-6' xml:lang="en" lang="en">

  <title>Cheaper by the Dozen</title>

  <isbn:number>1568491379</isbn:number>

  <notes>

    <!-- make HTML the default namespace for a hypertext commentary -->

    <p xmlns='http://www.w3.org/1999/xhtml'>

        This is also available <a href="http://www.w3.org/">online</a>.

    </p>

  </notes>

</book>

Figure 3: Example of incorporating the XHTML 1.0 namespace into a custom XML namespace.

Rocky Upgrade Path

The recommendation and associated documentation include descriptions of a number of ways that XHTML differs from HTML, arising because of the looseness allowed by early HTML specifications, the relative sloppiness allowed by most browsers when rendering HTML, and from the rigor required by XML.

An XHTML document must be structured properly, and elements that HTML doesn't require will cause an error in an XHTML document. The root element of an XHTML 1.0 document must be <html> and must designate the XHTML 1.0 namespace. The <head> and <body> elements cannot be omitted, and the <title> element must be the first element in the <head> element.

XHTML documents must be well-formed, strictly complying with syntax rules. This means that tags must be nested properly and all tags must have closing tags or written in a special form that combines the opening and closing tag. Element and attribute names must be lower case. XML is case-sensitive, and the XHTML DTDs are written in lower case.

User-defined attribute values, however, can be in any case. All attribute values, including those that appear to be numeric, must be quoted in single or double quotes:

<table border="1">

rather than the form acceptable in HTML:

<table border=1>

Empty elements must either have an end tag, or the start tag must end with />. This is sometimes called a self-terminating element. For example, elements can be written in either of the following ways. The first version is called the minimized tag syntax, and is generally preferred over paired tags that have no content between them. In the first form, placing a space before the / will make the form usable in some older browsers.

<hr />

<hr></hr>

All elements other than those declared as EMPTY in the DTD must have an end tag.

Elements must also be properly nested, so that closing tags must be in reverse order of the opening tags. For example, this code works in HTML:

<p><i>An italicized paragraph</p></i>

but will be unacceptable in XHTML because of the reversed closing tags. Instead, the following code conforms to the XHTML standard, because the tags are properly nested:

<p><i>An italicized paragraph</i><p>

An attribute is called minimized when there is only one value for it. For example, in the form element

<input type="checkbox" ... checked>

the attribute 'checked' has been minimized. Because XML does not support attribute minimization, in XHTML 1.0 attribute-value pairs cannot be minimized and must be written in full, as if they had multiple values.

<input type="checkbox" ... checked="checked" />

Different browsers handle white space characters, such as a line break, differently. When white space is used in attribute values, browsers strip leading and trailing white space and map sequences of white space characters to the ASCII space character. So you should avoid line breaks and multiple white space characters within attribute values.

Because any < and & characters are considered parts of tags in XHTML, any script and style tag sections must be wrapped in a CDATA section to ignore characters that would normally be considered markup. The only delimiter that is recognized in a CDATA section is the "]]>" string that ends the section. You can also use external script and style documents to solve the problem.

<script language="JavaScript">

<!--

<![CDATA[

// JavaScript code

]]>

//-->

</script>

Comments pose another problem. XML is not required to preserve comments in the body of a document, so you can no longer hide script code from the HTML parser by enclosing them in comments. XHTML will parse the document and throw away the comments before processing it. This is actually a good thing, because it has become too much of a catch all to hide every new feature in a Web page from browsers that can't understand it. Instead, wrap the script in a CDATA tag like this:

<script>

<[CDATA[

comment/script goes here

]]>

</script>

id and name attributes are used as fragment identifiers so that you can identify a tag and the fragment of code or content in a document. But XML recognizes only the id attribute. Use both id and name if you need to, but name has been formally deprecated, so you can't count on it appearing in future versions of the specification.

Nesting of elements in a document also are much tighter than in HTML. Table 2 lists some of the prohibitions.

Table 2: XHTML Element Prohibitions.

<a> cannot contain other <a> elements.

<pre> cannot contain the <img>, <object>, <big>, <small>, <sub>, or <sup> elements.

<button> cannot contain the <input>, <select>, <textarea>, <label>, <button>, <form>, <fieldset>, <iframe>, or <isindex> elements.

<label> cannot contain other <label> elements.

<form> cannot contain other <form> elements.

There are a lot of benefits to tightening up the markup code in a Web page. The parsing engines in browsers will be able to be much trimmer. Parsers now have way too much fat from having to deal with sloppy HTML code, defining how a particular browser will handle undefined situations. Best of all, either an XHTML document will work or it won't, and you'll know why. You may lose some of the tricks you've learned to force HTML into submission, but you'll also be a far more productive and precise developer.

Moving to XHTML

As with any time an irresistible new technology comes along, a Web author has to decide whether to migrate pages from HTML or start over from scratch and take full advantage of XHTML. There are a number of benefits to upgrading as well as some major pitfalls.

Because HTML is a pervasive standard and XML is becoming one, users can view carefully crafted XHTML documents in current versions of many browsers. In fact, a strictly conforming XHTML page is almost a joy to a browser because there isn't all the messy ambiguity that it finds in most Web pages built with HTML. Earlier browsers may choke on new HTML 4.01 tags, but that isn't XHTML's fault.

XHTML supports three main media types supported by most browsers, text/html, text/xml, and application/xml. Any scripting code that uses the HTML or XML document object models will work just fine in the new format.

The biggest time sinks in migrating HTML pages to XHTML will be converting tags and attributes to lower case, and adding quotes to attribute values. The cleaner the HTML code, the quicker that you'll be able to convert it to XHTML.

As this new standard sees wider adoption, new and existing Web editors are supporting XHTML and some will automatically convert existing pages. Code translators have long been the holy grail of computer science, but there is a reasonable chance that HTML to XHTML tools will actually work reliably. This is because most of the work is pretty mechanical: straightening out non-nested tags, embedding script in CDATA, including the DOCTYPE directive, etc. But some sloppy HTML code, acceptable to many old browsers, will translate poorly.

There are various tools listed on the W3C's XHTML Web site, but my favorites so far are HTML Kit and HTML Tidy working together (see the list of references on page &&). Figure 4 shows the HTML Kit freeware editor with the XMLDeveloper Web site tidied up for XHTML on the right.

Figure 4: The page at http://www.thethirdsector.com/, shown in HTML Kit, is easily and mechanically modified to comply with the XHTML standard. The biggest problem on this page is missing closing tags.

The XHTML standard has some rather rigid requirements for user agents, W3C-speak for browsers. Table 3 provides a summary of the requirements. These are generally only of interest to developers who are writing an XHTML browser, but understanding the required actions will help you as an XHTML developer understand how your content will be rendered, especially if there are any errors in the code. The W3C has various documents with guidelines for building user agents, if you want more information.

Table 3: Summary of XHTML requirements for user agent conformance.

In order to be consistent with the spec, a browser has to parse and evaluate an XHTML document for well-formedness, and if it claims to be a validating user agent, it must also validate documents against their referenced DTDs.

When a user agent processes an XHTML document as generic XML, it shall only recognize attributes of type ID as fragment identifiers. Fragment identifiers delineate portions of a document.

If a user agent encounters an element it does not recognize, it must render the element's content.

If a user agent encounters an attribute it does not recognize, it must ignore the entire attribute specification, including both the attribute and its value.

If a user agent encounters an attribute value it doesn't recognize, it must use the default attribute value.

If it encounters an entity reference for which the User Agent has processed no declaration, the entity reference should be rendered as the characters that make up the entity reference.

When rendering content, User Agents that encounter characters or character entity references that are recognized but not renderable should display the document in such a way that it is obvious to the user that normal rendering has not taken place.

The following characters are defined in [XML] as whitespace characters

Space (&#x0020;)

Tab (&#x0009;)

Carriage return (&#x000D;)

Line feed (&#x000A;)

and the user agent must comply with XHTML rules for whitespace elimination.

Roping the Wild, Wild Web

Despite the best efforts of the W3C, HTML has evolved in a less than orderly fashion. Because HTML is itself not extensible, browser vendors have rather haphazardly added tags. HTML has evolved at a pace far greater than any standards body could possibly keep pace with, so that the HTML standard is mostly a codification of existing practice rather than a source of innovation itself. As a result, any given HTML authoring tool supports at best a snapshot of HTML tags at a given time, no matter how fast the author runs to keep up.

Unfortunately, that means that your favorite HTML editor today may not be your tool of choice tomorrow, when XHTML becomes the norm. Unless, of course, Windows Notepad is still your editor of choice; then you're in fine shape to write new code. XHTML is too new for any of the major players to have made any commitment to support it. But with the rapid spread of support for XML, I'd be rather surprised if all of the major editors didn't rush to implement support.

During the transition to XHTML, validating code will be one of the biggest challenges. Validation is a process that verifies documents against the associated DTD, checking to make sure that the structure, elements, and attributes are consistent with the definitions in the DTD. Validating an XHTML 1.0 document involves verifying its markup against one of the three XHTML DTDs.

The W3C has an HTML Validation Service that is based on an SGML parser, with options such as including Weblint results and displaying the parse tree. The good news is that when the HTML Compatibility Guidelines are followed, XHTML 1.0 documents can be rendered on HTML 4.0-compliant browsers. One way to use the W3C validator is to place a link to http://validator.w3.org/check/referrer on your Web page. Clicking the link with your page loaded validates your page.

XHTML 1.1 is already under development, and will serve to make this next stage of Web technologies even more flexible. XML and HTML have a lot to offer each other. XML is not the "HTML-killer" it was touted in its early days, but when teamed up with its alleged victim, it promises to take over the Web.

References:

Resources and Recommended Standards

Clean up your Web pages with HTML Tidy: http://www.w3.org/People/Raggett/tidy/ 

HTML Kit Web editor, with support for HTML Tidy: http://www.chami.com/html-kit/ 

XHTML 1.0: The Extensible HyperText Markup Language W3C proposed recommendation: http://www.w3.org/TR/xhtml1/ 

XHTML.ORG, a Web site with news and information: http://www.xhtml.org/ 

Public list about HTML, hosted and archived by W3C: http://lists.w3.org/Archives/Public/ 

HTML 4.0: http://www.w3.org/TR/html401/ 

W3C Working Drafts for XHTML

XHTML Basic, a subset of XHTML for handheld devices: http://www.w3.org/TR/html401/ 

XHTML 1.1, a module-based version of XHTML: http://www.w3.org/TR/xhtml11/ 

Modularization of XHTML: http://www.w3.org/TR/xhtml-modularization/ 

XHTML Events Module, updated events syntax for XML-based markup languages: http://www.w3.org/TR/xhtml-events/ 

XHTML Document Profile Requirements, a basis for interoperability guarantees: http://www.w3.org/TR/xhtml-prof-req/ 

Building XHTML Modules: http://www.w3.org/TR/xhtml-building/ 

  

Rate this article on a scale of 1 to 10

Your vote :  


 

Recent Jobs

An immediate job opportunity as a B
Software Developers Needed in Charl
Sr. Software Engineer - Analytics
Immediate Mainframe openings for Ch
Immediate TANDEM-TAL openings for C

View all Jobs (Add yours)
View all CV (Add yours)



answering service
go to meeting
swimming pool contractor
teleconferencing
water softener
Teleconference
Host Department NOLIMIT Web Hosting
MSN
sunglasses


    Email TopXML  

Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP