BizTalk Utilities CV ,   Jobs ,   Code library  
 
 
Page 6 of 7

 

Previous Page Table Of Contents Next Page

Converting HTML to XHTML, cont.

Working with HTML Tidy

We know the answer to all of your prayers is automation. If you have only a couple HTML pages to your credit, manually converting your documents is a snap. However, if you're a veteran of the Web-design field and went public with your pages years ago (or even months ago), you probably have many HTML documents to convert. There must be an easier way to tackle converting them, and there is.

If you're familiar with the W3C site, you might be familiar with Dave Raggett. He's the Tom Cruise of Web design-well, one of them anyway-and has been hard at work to make your life easier. For anyone searching for a good tool to convert those HTML pages to XHTML, Dave Raggett has created one with your interests in mind: HTML Tidy. You can choose from three different interfaces from which to work with HTML Tidy: a DOS-based interface, a Web-based one, and graphical user interface (GUI) form. It gets better-all three versions of this tool are free!

HTML Tidy from a Command Line

In the past, Windows users could use only HTML Tidy from the command line (and you still can). Therefore, if you used computers in the 80s and early 90s, you're familiar with the command line, and you really want to use it, read the this section.

Before Windows and the ease of drag and drop, PCs were run from a DOS prompt. From the DOS prompt, a PC user could do most of what you do today in a Windows environment. Except the user had to use commands to tell the computer what to do. Some diehards still work from the DOS prompt, but we make the assumption that most of you out there have adopted the wonderful world of Windows. So, to start with, download this version of HTML Tidy from www.w3.org/People/Raggett/tidy/.

Next, you need to see the DOS prompt in action. Go to Start, Programs, MS-DOS Prompt. A window with a black background opens with a prompt at the folder C:\Windows.

HTML Tidy for the MAC

If you use a Mac, the GUI version was available long ago. Mac users can find out more at www.geocities.com/SiliconValley/1057/tidy.html.

Now you can experiment with HTML Tidy. To get started, you do the same as demonstrated previously. Go to Start, Programs, MS-DOS Prompt. Find the folder that contains HTML Tidy. For this example, the folder is C:\tidy. To locate this folder, we typed the following at the command line:

cd ..\tidy

Use the cd command to change the folder and then use ..\ to navigate back to the root folder. The last code is tidy and it opens the tidy folder. Be sure to include a space between cd and ..\ but no space between ..\ and tidy.

After you do that, you're ready to use HTML Tidy. If you decided that you're brave enough to tackle HTML Tidy from the DOS prompt, the next step is to convert the nasty HTML document shown in Example 4.3 to clean XHTML.

Example 4.3  ‑This Document Needs to Be Converted to Clean XHTML, Which You Do with HTML Tidy

<HTML>

<Title>Sloppy Code at Play</Title>

<H1>HTML Document with Mistakes<h1>

<P>This document does not adhere to XHTML rules.

<p>Take a second to see all the mistakes.

<ul>

<li>All element names are not lowercase.

<li>Many elements do not contain the required closing tags.

<li>The XHTML namespace is not used.

<li>The document is missing some required elements.

<li>The document lacks the required DOCTYPE declaration.

</UL>

</html>

On our machine, the previous HTML file is located in the following directory:

c:\XHTML\tidy.html

To convert this document on your computer, you need to do a few things first:

      n  Create a folder on your hard drive called XHTML. (The name is not important, but if you decided to choose your own filename, be sure to change the name in all the right places.)

      n  Find Example 4.3 on the CD in the Chapter 4 examples folder and save it in the XHTML folder on your hard drive with the name tidy.html.

Now, back to the example. To clean up this file with HTML Tidy, enter the following code at the command prompt:

tidy -asxml -m  c:\XHTML\tidy.htm

The spaces are important and so is every consonant. What does all that code mean? First, tidy identifies the program to use. -asxml instructs Tidy to convert the HTML document to XHTML. -m tells the program to modify the document in its current location, and c:\XHTML\tidy.htm is the location of the messy document to be converted.

After you enter this line and press Enter, the next time your open your document (tidy.html), you'll find an XHTML document instead. The resulting code is shown in Example 4.4.

The Result of Running the Code in Example 4.3 Through HTML Tidy

<?xml version="1.0"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html>

<head>

<meta name="generator" content="HTML Tidy, see www.w3.org" />

<title>Sloppy Code at Play</title>

</head>

<body>

<h1> HTML Document with Mistakes</h1>

<p>This document does not adhere to XHTML rules.</p>

<p>Take a second to see all the mistakes.</p>

<ul>

  <li>All element names are not lowercase.</li>

  <li>Many elements do not contain the required closing tags.</li>

  <li>The XHTML namespace is not used.</li>

  <li>The document is missing some required elements.</li>

  <li>The document lacks the required DOCTYPE declaration.</li>

</ul>

 

</body>

</html>

If you want to, save the XHTML document in a separate location to keep the HTML document intact.

You can also personalize the experience and affect only one conversion rule (for example, convert all element names to lowercase), leaving the other XHTML rules alone. Both of these cases, as well as many more, have been taken into consideration. All it takes to perform these tasks is a different code word or two. For a complete listing of commands to work with, see www.w3.org/People/Raggett/tidy/#help.

HTML Tidy Online

If you're like many people, you might want to avoid the whole DOS experience. For those of us who are addicted to the Web, WebReview has published a Web-friendly version of HTML Tidy that has an easy-to-use interface. All you have to do is type the URL for the HTML Web page that you want converted and click a button. The new XHTML page is displayed for you online. Keep in mind that when the new page is displayed, you have to select File, Save to save it to your computer. To view the code before saving it, you can always view the source code (View, Source). View the HTML Tidy, Web-friendly front end at www.webreview.com/1999/07/16/feature/xhtml.cgi.

TidyGUI

For those of us who don't enjoy a DOS-dominated interface and feel more comfortable doing the conversion on our own machines, André Blavier has created a Windows interface for HTML Tidy. TidyGUI provides all the options of HTML Tidy, with all the ease of a Windows interface that you can customize. For a snapshot of this new Tidy facelift, see Figure 4.2. Enter the desired file to be converted into the field titled, Source File, and then select the Tidy button. If you want to customize the conversion process, select the Configuration button and make any changes you like. It is that easy!

 

Page 6 of 7

 

Previous Page Table Of Contents Next Page
 

Recent Jobs

An immediate job opportunity as a B
Software Developers Needed in Charl
Sr. Software Engineer - Analytics
Immediate Mainframe openings for Ch
Immediate TANDEM-TAL openings for C

View all Jobs (Add yours)
View all CV (Add yours)



answering service
help desk services
swimming pool contractor
help desk
water softener
Teleconference
Host Department NOLIMIT Web Hosting
MSN
sunglasses


    Email TopXML  

Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP