We know the answer to all of your prayers is automation. If you
have only a couple HTML pages to your credit, manually converting
your documents is a snap. However, if you're a veteran of the
Web-design field and went public with your pages years ago (or even
months ago), you probably have many HTML documents to convert. There
must be an easier way to tackle converting them, and there is.
If you're familiar with the W3C site, you might be familiar with
Dave Raggett. He's the Tom Cruise of Web design-well, one of them
anyway-and has been hard at work to make your life easier. For anyone
searching for a good tool to convert those HTML pages to XHTML, Dave
Raggett has created one with your interests in mind: HTML Tidy. You
can choose from three different interfaces from which to work with
HTML Tidy: a DOS-based interface, a Web-based one, and graphical user
interface (GUI) form. It gets better-all three versions of this tool
are free!
HTML Tidy from a Command Line
In the past, Windows users could use only HTML Tidy from the
command line (and you still can). Therefore, if you used computers in
the 80s and early 90s, you're familiar with the command line, and you
really want to use it, read the this section.
Before Windows and the ease of drag and drop, PCs were run from a
DOS prompt. From the DOS prompt, a PC user could do most of what you
do today in a Windows environment. Except the user had to use
commands to tell the computer what to do. Some diehards still work
from the DOS prompt, but we make the assumption that most of you out
there have adopted the wonderful world of Windows. So, to start with,
download this version of HTML Tidy from
www.w3.org/People/Raggett/tidy/.
Next, you need to see the DOS prompt in action. Go to Start,
Programs, MS-DOS Prompt. A window with a black background opens with
a prompt at the folder C:\Windows.
HTML Tidy for the MAC
If you use a Mac, the GUI version was available long ago. Mac
users can find out more at
www.geocities.com/SiliconValley/1057/tidy.html.
Now you can experiment with HTML Tidy. To get started, you do the
same as demonstrated previously. Go to Start, Programs, MS-DOS
Prompt. Find the folder that contains HTML Tidy. For this example,
the folder is C:\tidy. To locate this folder, we typed the following
at the command line:
cd ..\tidy
Use the cd command to change the folder and then use ..\ to
navigate back to the root folder. The last code is tidy and it opens
the tidy folder. Be sure to include a space between cd and ..\ but no
space between ..\ and tidy.
After you do that, you're ready to use HTML Tidy. If you decided
that you're brave enough to tackle HTML Tidy from the DOS prompt, the
next step is to convert the nasty HTML document shown in Example 4.3
to clean XHTML.
Example 4.3 ‑This Document Needs to Be Converted
to Clean XHTML, Which You Do with HTML Tidy
<HTML>
<Title>Sloppy Code at Play</Title>
<H1>HTML Document with Mistakes<h1>
<P>This document does not adhere to XHTML rules.
<p>Take a second to see all the mistakes.
<ul>
<li>All element names are not lowercase.
<li>Many elements do not contain the required closing
tags.
<li>The XHTML namespace is not used.
<li>The document is missing some required elements.
<li>The document lacks the required DOCTYPE declaration.
</UL>
</html>
On our machine, the previous HTML file is located in the following
directory:
c:\XHTML\tidy.html
To convert this document on your computer, you need to do a few
things first:
n Create a folder
on your hard drive called XHTML. (The name is not important, but if
you decided to choose your own filename, be sure to change the name
in all the right places.)
n Find Example 4.3
on the CD in the Chapter 4 examples folder and save it in the XHTML
folder on your hard drive with the name tidy.html.
Now, back to the example. To clean up this file with HTML Tidy,
enter the following code at the command prompt:
tidy -asxml -m c:\XHTML\tidy.htm
The spaces are important and so is every consonant. What does all
that code mean? First, tidy identifies the program to use. -asxml
instructs Tidy to convert the HTML document to XHTML. -m tells the
program to modify the document in its current location, and
c:\XHTML\tidy.htm is the location of the messy document to be
converted.
After you enter this line and press Enter, the next time your open
your document (tidy.html), you'll find an XHTML document instead. The
resulting code is shown in Example 4.4.
The Result of Running the Code in Example 4.3 Through HTML
Tidy
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org"
/>
<title>Sloppy Code at Play</title>
</head>
<body>
<h1> HTML Document with Mistakes</h1>
<p>This document does not adhere to XHTML
rules.</p>
<p>Take a second to see all the mistakes.</p>
<ul>
<li>All element names are not
lowercase.</li>
<li>Many elements do not contain the required closing
tags.</li>
<li>The XHTML namespace is not used.</li>
<li>The document is missing some required
elements.</li>
<li>The document lacks the required DOCTYPE
declaration.</li>
</ul>
</body>
</html>
If you want to, save the XHTML document in a separate location to
keep the HTML document intact.
You can also personalize the experience and affect only one
conversion rule (for example, convert all element names to
lowercase), leaving the other XHTML rules alone. Both of these cases,
as well as many more, have been taken into consideration. All it
takes to perform these tasks is a different code word or two. For a
complete listing of commands to work with, see
www.w3.org/People/Raggett/tidy/#help.
HTML Tidy Online
If you're like many people, you might want to avoid the whole DOS
experience. For those of us who are addicted to the Web, WebReview
has published a Web-friendly version of HTML Tidy that has an
easy-to-use interface. All you have to do is type the URL for the
HTML Web page that you want converted and click a button. The new
XHTML page is displayed for you online. Keep in mind that when the
new page is displayed, you have to select File, Save to save it to
your computer. To view the code before saving it, you can always view
the source code (View, Source). View the HTML Tidy, Web-friendly
front end at www.webreview.com/1999/07/16/feature/xhtml.cgi.
TidyGUI
For those of us who don't enjoy a DOS-dominated interface and feel
more comfortable doing the conversion on our own machines,
André Blavier has created a Windows interface for HTML Tidy.
TidyGUI provides all the options of HTML Tidy, with all the ease of a
Windows interface that you can customize. For a snapshot of this new
Tidy facelift, see Figure 4.2. Enter the desired file to be converted
into the field titled, Source File, and then select the Tidy button.
If you want to customize the conversion process, select the
Configuration button and make any changes you like. It is that
easy!