Translation from HTML to XHTML is truly a breeze. Instead of
thinking of it as a full-blown conversion process, think of it as
cleaning up your HTML document. XHTML uses the same elements-the only
difference is syntax and rules. We break them down for you in the
following sections.
XML Syntax Rules
All XML documents have something in common: syntax. There are a
few, easy rules to remember when you work with XML documents. Here
they are, one by one:
n All elements
must be contained by a root element (also called a document element).
For XHTML, the html element is the root element:
<html>...</html> contains all other elements
n If your document
adheres to a Document Type Definition (DTD), the document must
include a document type declaration. For XHTML, the document type
declaration is formed as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The DOCTYPE declaration
is not an element; it's a declaration and has its own syntax. All
DOCTYPE declarations begin with an exclamation point and uppercase
keyword (for example, <!DOCTYPE>). This is not negotiable. If
you're new to the term DTD, please read Chapter 2.
n All element and
attribute names must be lowercase (for example, <head> not
<HEAD>).
n All nonempty
elements must have a closing tag (for example,
<p>...</p>).
n All empty
elements must use the following syntax: <img />. (For
backward-compatibility reasons, although it's not required, be sure
to include a space between the element name and />.)
n Elements must be
nested correctly (for example, <p><b>This is
correct</b></p> and <p><b>This is not
correct</p></b>).
n All attributes
must have values and those values must be contained by single or
double quotation marks. This means that the standalone attributes
used in HTML (such as <td nowrap>) are no longer valid). The
correct form is <td nowrap="nowrap">.
The main syntax rules end there. However, there are a few
XHTML-specific rules you have to follow as you convert your
documents.
XHTML-Specific Rules
To become familiar with the XHTML-specific rules, let's go through
an HTML document from top to bottom and identify the necessary
changes and additions to required elements.
The first item on an HTML page is normally the html element. This
is no longer the case. There are a few pieces of markup that must
come at the beginning of any XHTML document.
First, you must include a DOCTYPE declaration (also known as a
document type declaration). You may also include XML version and
encoding information (which is optional) in the XML declaration right
before the DOCTYPE declaration. We strongly recommend that you
include the XML declaration. For more on the XML declaration, see
Chapter 2. The correct markup takes the form:
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0
Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The next item on the list is the html element. In XHTML, this
element is called the root element. The root element contains all
other elements. To read more on including a root element, see Chapter
3, "Overview of Element Structures." Not only is the html element
required, but it must also contain a predefined attribute-value pair.
This specific required attribute is called an XMLnamespace. XML
namespaces are commonly used in XML documents. The specific namespace
used for XHTML documents is often referred to as the XHTML namespace
and is a way to uniquely identify the set of elements as XHTML
elements. The correct XHTML namespace is as follows:
<html>
Imagine that you're in the world of XML and you want to combine
your XHTML document with some elements you made from scratch. In the
list of elements you named, you decided to use a title element type
to represent the title of a book. This means that your combined
document now has two title element types with two completely
different meanings.
How do you tell the XML processor (a browser is one type of
processor) that the XHTML title element is the title of the document
and your title element represents the title of a book? You use a
namespace to uniquely separate the two.
Namespaces are like surnames. Defining a namespace in the root
element means that all elements contained by the html element belong
to the XHTML element set. Any elements that are not contained by the
html element or that have their own namespaces attached will belong
to a different set of elements. The namespace debate is not yet
settled at the W3C-for a while, there was an argument about how and
when to use namespaces. To learn more about namespaces, see Chapter 2
or check out the W3C site at www.w3.org/TR/REC-xml-names/.
Next, you must include the head, title, and body structural
elements, which are required in XHTML. The one exception is if you're
creating a frameset document. In this case, you have to replace the
body element with the frameset element.
Now that you know the differences between XHTML and HTML, it's
time to convert an HTML document to an XHTML document.