Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :
04/18/2000
Times viewed :
397
On-request Conversion of HTML to WML
James
Britt is an Internet application developer in Phoenix, Arizona. He is the co-author (with Teun
Duynstee) of Professional Visual Basic 6 XML (from Wrox Press). He has been a
long standing member of TopXML
by James Britt
What is WAP?
WAP is the Wireless Application Protocol. It is based on familiar Internet technologies, and was developed to deliver web content and services to wireless clients, such as mobile phones. Since, at present, such devices have severe memory, power, and bandwidth restrictions, web pages are formatted using WML (the Wireless Markup Language). It is an application of XML that is much like a restricted version of HTML. Further, there are special WAP servers that compile WML into byte code before sending the page to the device, thereby reducing the page size. However, WML can still be served "straight up" from any web server.
Introduction to converting HTML to WML
Recently (meaning around March of 2000), the TopXML mailing list had a discussion thread about WML, or the Wireless Markup Language. Someone asked about developing for WML, and I replied that I had done a little playing around with this. Many months prior I had put together a test web page that would take a URL as a query string parameter, fetch the document, convert it to WML, and return the converted document with a WML MIME type. The idea was to allow a WAP-phone user to be able to view any HTML page on the Internet, not just those specifically formatted in WML. For example, if you wanted to see the XML Hack home page
(www.xmlhack.com), you would, from your WAP phone, go to a WML page that presented a simple form. The desired URL would be entered, and the form submitted. The web server would receive the submission, fetch the requested page, convert it to WML, and send back the results.
The owners of the TopXML web site asked me if I would consider writing an article about this, and I agreed. The only problem, though, was the code I wrote was in Perl, and ran on an Apache web server. Further, it did not use any XSL. The transformation from HTML to WML was done using string replacement and regular expressions; at the time, there was no decent XSLT Perl module. Still, the concepts would apply to any platform, and I though that building a similar app for IIS using VB would be interesting.
The problem breaks down into a set of tasks(steps):
Provide a WML page where a user can enter a URL into a form field
Receive the web page form request
Pull out the desired URL from the query string
Fetch the desired page
Convert it to well-formed XML (e.g. XHTML)
Transform the XML into WML
Write the new document back to the WAP device
This article will go through these steps, providing code for each task; there are references at the end for more information on some of the more complex areas.
I should point out right here that the implementation, while functional, has some flaws.
Steps 1,2 and 3 are quite simple. However, step 4 presents a problem: Microsoft does not provide a nice component for fetching web pages from a web server. Yes, there are objects like the Internet Transfer Control, but these use URLMON or WININET components under the hood. Microsoft has a Knowledge Base
article explaining that these objects were not designed to be used from a server, and that such calls should be done using Winsock. (You may have run across this if youve investigated using the XMLHttpRequest object to fetch web pages from an inside an Active Server Page). Whats odd is that there are also KB articles [here, and
here] explaining how to use the Internet Transfer Control from of ASP. Further, the specific problems are ill-defined, and Ive had personal success using URLMON/WININET based objects on the server.
To prepare this article, I tried some third-party controls, but wasnt terribly happy with them. So, Ive taken the easy route and written a component that uses the Internet Transfer control. It works just fine on Windows 98 and Windows 2000 Professional, though it may give you a problem on an NT Server; I havent tested it. The code is wrapped in a stand-alone object, and I use interface-based code to load and call the object, so changing it should be no problem if you decide to use something else. I may go back and write a proper ActiveX DLL using Winsock, but thats a project for another day.
Step 5 is also a problem. Turning arbitrary HTML into well-formed XML is not a trivial task. Im aware of only one tool that does this reasonably well: tidy.exe, available from the W3.org site. Unfortunately, its a command-line executable. I used the Linux version of tidy for my Apache/Perl experiment, and had to use a system call and temporary disk files to convert the HTML. The code here will do the same thing. Yes, this is very ugly; Ive put this functionality into a separate object, so that if a better tool comes along it can be swapped in without breaking too much code.
Step 6 is straightforward in principle, but in practice writing XSLT to transform arbitrary HTML requires some decent knowledge of the WML spec. I must confess that I am not a big fan of WML. My gut feeling is that, given the rate technology advances, the display capabilities of wireless devices will rapidly approach that of todays hand-held or laptop computers. I believe that most users would opt for viewing web pages as is (or close to it), rather than seeing a much sparser WML version. (My prediction: wireless devices will understand XHTML, and WML will go away.) So, be warned that the WML code presented is simplistic at best; my hope is that this code will be of interest to other TopXML readers, who will take the basic framework and expand it.
Finally, step 7 is simple once we have the content we want to write back to the user.
Submitting a Request for a Web Page
The web site for all this is ASP code running on IIS. Ive been using IIS 5, though I dont see why it shouldnt work the same on IIS 4. The site has two web pages, along with a global.asa file, a configuration file, and an XSL file.
Global.asa looks like this:
<SCRIPT LANGUAGE=VBScript RUNAT=Server>
Option Explicit Sub
LoadConfigData(sServerPath, sVarName)
Dim sFname
Dim fso Dim tso
sFname = Server.MapPath(sServerPath)
Set fso = Server.CreateObject("Scripting.FileSystemObject")
if fso.FileExists(sFname) then
Set tso = fso.OpenTextFile(sFname, 1, False)
Application(sVarName) = tso.ReadAll()
When the application starts, Application variables are set to hold the configuration data (used to set up the VB objects used) and the XSLT for transforming the HTML into WML. LoadConfigData simply opens a file and reads it into an Application variable. The configuration data is an XML file kept in a file called Html2WmlConfig.xml, in the Config directory; the XSL is in a file called html2wml.xsl, kept in the Style directory. Ill discuss these when we get to the code that uses them.
The site also has a default.asp page:
<<%@ Language=VBScript %><%
Response.ContentType = "text/vnd.wap.wml"
%>
The page starts off by declaring a WML MIME type, "text/vnd.wap.wml", which is required for WAP devices to recognize the page. Ive found it helpful to omit this while debugging so that I could simply call the page through a regular browser. It then provides the actual page content:
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.2//EN" "http://www.wapforum.org/DTD/wml_1.2.xml">
This should render a WML page with a simple form containing a field where the user can enter a URL. When the form is submitted, it will send a request to fetch.asp, passing the desired URL on the query string.
Heres fetch.asp:
<%@ Language=VBScript%>
<%
Response.ContentType = "text/vnd.wap.wml"
<%
Response.ContentType = "text/vnd.wap.wml"
The page begins by setting the MIME type. It then gets a reference to an MTS object called Html2Wml.Conversion. This class will use another object to request the desired web page, and yet another object to convert that page to well-formed XML.
Dim oCvrt
Set oCvrt = Server.CreateObject("Html2Wml.Conversion")
The code checks that the object was instantiated correctly, and if so, begins to emit WML. Otherwise, it will send back an error message:
if (oCvrt Is Nothing) then
Response.Write "<?xml version='1.0'?>
Response.Write "<!DOCTYPE wml PUBLIC
'-//WAPFORUM//
DTD WML 1.2//EN' "
Response.Write " 'http://www.wapforum.org/DTD/wml_1.2.xml'>
Response.Write "<wml><card id='card2' title='Convert error'>"
Response.Write "<p>Server error.</p>"
Response.Write "</card></wml>"
Else
The code also checks that it was able to configure the object before using it:
If oCvrt.Configure(sXML) Then
sWML = oCvrt.FetchAndConvert(sURL, sXSL)
Response.Write sWML
Else
Response.Write "<?xml version='1.0'?>
Response.Write "<!DOCTYPE wml PUBLIC '-//WAPFORUM//DTD WML 1.2//EN' "
Response.Write " 'http://www.wapforum.org/DTD/wml_1.2.xml'>
Response.Write "<wml><card id='card2' title='Convert error'>"
Response.Write "<p>Conversion error.</p>"
Response.Write "</card></wml>"
End If
%>
Making HTTP Requests from the Server
The code for Html2Wml.Conversion is not too complex, but it uses two other classes, so Ill describe those first. First, to make a request for a web page, we need an object that can perform an HTTP GET. I decided to knock out a basic ActiveX DLL that wraps the Inet control shipped with VB 6. However, I need to make a slight digression here. Because my DLL is less than ideal, I decided that any code that uses it should really be written against an interface class. This would allow the code to use early binding during compilation and execution, but the actual object used could be decided at run-time. So, Ive defined another VB class that serves to provide a type library for the actually class used. The project is an ActiveX DLL called ISimpleHTTP, with a single class called IRequest:
'**************************************************************
' ISimpleHTTP.IRequest
' Interface class for fetching web pages
'**************************************************************
Option Explicit
Public Function Configure(sXML As String) As Boolean
End Function
Public Function OpenURL(strURL As String) As String
End Function
It does nothing more than define method signatures. The first method is a generic configuration function that would be used to set any parameters the actual code would use. The second method just takes a String specifying a URL, and returns a string with the retrieved HTML.
Compiling the class will give us a type library that we can use in other projects. I then created the real code for this in an ActiveX DLL project called InetWrapper. It contains a single class (called HTTP), and a form (called frmMain). The project uses a reference to the Inet control, which is placed on the form. The control is then manipulated from the class by getting a reference to the form. The code also requires a reference to the Microsoft XML parser; Ive used version 3.
'**************************************************************
' Class InetWrapper.HTTP
' Wraps the INet control in an ActiveX DLL
' Copyright (C) 2000 James Britt
' james@logicmilestone.com '
' This program is free software; you can redistribute it and/or
' modify it under the terms of the GNU General Public License
' as published by the Free Software Foundation; either version 2
' of the License, or (at your option) any later version.
'**************************************************************
Option Explicit
Implements ISimpleHTTP.IRequest
Private objForm As Object
Option Explicit
Implements ISimpleHTTP.IRequest
Private objForm As Object
The class has a reference to the ISimpleHTTP.IRequest type library just created. The Implements keyword is used to indicate that this class will implement the methods defined by ISimpleHTTP.IRequest. IRequest_Configure takes an XML string that specifies how many seconds the Inet control should wait before a web page request has a time-out:
'**************************************************************
' Private Function IRequest_Configure(sXML As String) As String
' Set parameters on the Inet object
' <InetConfig>
' <RequestTimeOut>300</RequestTimeOut>
' <InetConfig>
'**************************************************************
Private Function IRequest_Configure(sXML As String) As Boolean
Dim oDOM As DOMDocument30
Dim oEl As IXMLDOMElement
Set oDOM = New DOMDocument30
If oDOM.loadXML(sXML) Then
Set oEl = oDOM.getElementsByTagName("RequestTimeOut").Item(0)
If Not oEl Is Nothing Then
objForm.Inet1.RequestTimeout = oEl.Text
IRequest_Configure = True
Set oDOM = Nothing
Else
Err.Raise -1, "InetWrapper.HTTP.Configure", _
"Can not find
RequestTimeOut."
IRequest_Configure = False
End If
Else
Err.Raise -1, "InetWrapper.HTTP.Configure", _
"Can not parse XML: " & oDOM.parseError.reason
IRequest_Configure = False
End If
End Function
The code just sets a reference to the DOM object, loads the XML, and tries to pull out the RequestTimeOut value.
IRequest_OpenURL takes a URL, and uses the Inet control to retrieve it, returning the HTML:
'**************************************************************
' Private Function IRequest_OpenURL(strURL As String) As String
' Go get a web page and return it.
'**************************************************************
Private Function IRequest_OpenURL(strURL As String) As String
The class also uses the Initialize and Terminate events to get a reference to the form and to clean up:
Private Sub Class_Initialize()
Set objForm = New frmMain
End Sub
Private Sub Class_Terminate()
Set objForm = Nothing
End Sub
Cleaning Up HTML
Once weve got some HTML, we need to clean it up before we try to transform it to WML. I use the tidy utility (available from www.w3.org) to do this. This is available for several platforms; the Windows version is a command-line program. To use this from VB, I needed to have the code execute the program with some parameters, then read in the resulting file. I had originally used the Shell function, but found it troublesome to track when the shell process had finished. Poking around on the MSDN site
(msdn.microsoft.com) I found some example code that used a few Win32 API calls to do this. Once again, because this is a sub-optimal solution, I created an interface class so that swapping out the actual tidy code would be easy. (I present it as a challenge to the reader to write an in-memory DLL that does this. The C source code for tidy is also available from w3.org.)
The interface class is called IHttpUtils.ITidy. It is an ActiveX DLL project with a single class
'**************************************************************
' IHttpUtils.ITidy
' Interface class for cleaning up HTML
'**************************************************************
Option Explicit
Public Function HtmlToXML(ByVal sHTML As String) As String
End Function
Public Function Configure(ByVal sConfigXml As String) As Boolean
End Function
As with the previous interface class, this just defines method signature, which will be implemented in another class called HTMLUtils.Convert. This is part of an ActiveX DLL project. It requires references to the Scripting runtime library, the Microsoft XML parser (version 3), and the interface class just defined.
The project has two classes and a module. The module is called Declares.bas, and contains Win32 API declarations:
' Code taken from MSDN example
Option Explicit
Public Declare Function GetModuleUsage% Lib "Kernel" _
(ByVal hModule As Long)
Public Declare Function GetModuleHandle Lib "kernel32" _
Alias "GetModuleHandleA" _
(ByVal lpModuleName As String) As Long
Public Declare Function FindWindow Lib "user32" _
Alias "FindWindowA" _
(ByVal lpClassName As String, _
ByVal lpWindowName As String) As Long
Public Declare Function IsWindow Lib "user32" _
(ByVal hwnd As Long) As Long
'Constants used by the API functions
Public Const WM_CLOSE = &H10
Public Const INFINITE = &HFFFFFFFF
Public Type STARTUPINFOcb As Long
lpReserved As String
lpDesktop As String
lpTitle As String
dwX As Long
dwY As Long
dwXSize As Long
dwYSize As Long
dwXCountChars As Long
dwYCountChars As Long
dwFillAttribute As Long
dwFlags As Long
wShowWindow As Integer
cbReserved2 As Integer
lpReserved2 As Long
hStdInput As Long
hStdOutput As Long
hStdError As Long
End Type
Public Type PROCESS_INFORMATION
hProcess As Long
hThread As Long
dwProcessID As Long
dwThreadID As Long
End Type
Public Declare Function WaitForSingleObject Lib "kernel32"
_
(ByVal hHandle As Long, ByVal dwMilliseconds As Long)
_
As Long
Public Declare Function CreateProcessA Lib "kernel32"
_
(ByVal
lpApplicationName As Long, _
ByVal lpCommandLine As String,
ByVal lpProcessAttributes As Long,
ByVal lpThreadAttributes As Long, _
ByVal bInheritHandles As Long,
_
ByVal dwCreationFlags As Long, _
ByVal lpEnvironment As Long,
_
ByVal lpCurrentDirectory As Long, _
lpStartupInfo As
STARTUPINFO, lpProcessInformation As _
PROCESS_INFORMATION) As Long
Public Declare Function CloseHandle Lib "kernel32" _
(ByVal hObject As Long) As Long
Public Declare Function GetExitCodeProcess Lib "kernel32" _
(ByVal hProcess As Long, lpExitCode As Long) As Long
Public Const NORMAL_PRIORITY_CLASS = &H20&
A class called MinorUtils is used to handle the system call for running tidy.
'**************************************************************
' HTMLUtils.MinorUtils
' Some code from the MSDN web site for
' running shell commands and checking that the called process
' has finised
'**************************************************************
Option Explicit
There is a function for modifying the return code form a Shell call
'**************************************************************
' Private Function FixValue(ByVal lVal As Long) As Integer
' This function is necessary since the value returned by Shell
' is an unsigned integer and may exceed the limits
' of a VB integer.
' Taken from MSDN.
'**************************************************************
Private Function FixValue(ByVal lVal As Long) As Integer
If (lVal And &H8000&) = 0 Then
FixValue = lVal And &HFFFF&
Else
FixValue = &H8000 Or (lVal And &H7FFF&)
End If
End Function
There is another function for actually executing a shell command and waiting for the process to finish before returning:
'**************************************************************
' Private Function ExecCmd(cmdline$)
' Execucutes a command-line call and waits until the job is done
' before returning.
' Taken from MSDN site.
'**************************************************************
Friend Function ExecCmd(cmdline$)
Dim proc As PROCESS_INFORMATION
Dim start As STARTUPINFO
Dim ret&
' Initialize the STARTUPINFO structure:
start.cb = Len(start)
' Wait for the shelled application to finish:
ret& = WaitForSingleObject(proc.hProcess, INFINITE)
Call GetExitCodeProcess(proc.hProcess, ret&)
Call CloseHandle(proc.hThread)
Call CloseHandle(proc.hProcess)
ExecCmd = ret&
End Function
Private Sub Class_Initialize()
End Sub
The main class, Convert, implements the IHttpUtils.ITidy interface:
'**************************************************************
' Class HTMLUtils.Convert
' Implements IHttpUtils.ITidy, whoch defines methods
' for class configuration and cleaning up HTML into
' well-formed XML'
' Copyright (C) 2000 James Britt
' james@logicmilestone.com '
' This program is free software; you can redistribute it and/or
' modify it under the terms of the GNU General Public License
' as published by the Free Software Foundation; either version 2
' of the License, or (at your option) any later version.
'**************************************************************
Option Explicit
Implements IHttpUtils.ITidy
Private g_oDOM As DOMDocument30
Private g_oUtils As MinorUtils
Private m_Configured As Boolean
Private m_sPathToTidyBAT As String
Private m_TidyDir As String
Private m_TidyBatchName As String
Private m_TidyInputFileName As String
Private m_sConfigXML As String
Option Explicit
Implements IHttpUtils.ITidy
Private g_oDOM As DOMDocument30
Private g_oUtils As MinorUtils
Private m_Configured As Boolean
Private m_sPathToTidyBAT As String
Private m_TidyDir As String
Private m_TidyBatchName As String
Private m_TidyInputFileName As String
Private m_sConfigXML As String
To run tidy with the correct arguments, a batch file is created using CreateTidyBatch. This takes strings specifying the directory where tidy.exe lives, the name of the batch file to be created, and the name of the input file to be processed:
'**************************************************************
' Private Function CreateTidyBatch(ByVal sTidyDir As String,
' ByVal sFileName As String,
' ByVal sInputFile As String) As Boolean
' Creates a batch file for calling tidy.exe, which cleans up
' the raw HTML
'**************************************************************
Private Function CreateTidyBatch(ByVal sTidyDir As String, _
ByVal sFileName As String, _
ByVal sInputFile As String) As Boolean
Dim oFso As Scripting.FileSystemObject
Dim oTso As Scripting.TextStream
Set oFso = New FileSystemObject
If (m_TidyDir = "") Then
CreateTidyBatch = False
Exit Function
End If
(For a list of available tidy arguments, and what they mean, run tidy with the -help argument.)
To configure the class, an XML string is passed to ITidy_Configure, specifying the path to the tidy directory, the name of batch file to create for running tidy, and the name of the input file tidy will parse. Note that these elements may be part of a larger XML document; no validation is done on the document, which allow us to use a single configuration file (stored in the ASP application) for multiple objects.
'**************************************************************
' Private Function ITidy_Configure(ByVal sConfigXml As String) As Boolean
' Takes an XML string that defines paramters for the conversion tool.
' We happen to be using tidy.exe, which needs some file path and file name info.
' <WMLConvertConfig>
' <TidyDir></<TidyDir>
' <BatchFileName></BatchFileName>
' <InputFileName></InputFileName>
' </WMLConvertConfig>
'**************************************************************
Private Function ITidy_Configure(ByVal sConfigXml As String) As Boolean
If g_oDOM.loadXML(sConfigXml) Then
m_TidyDir = g_oDOM.selectSingleNode("//TidyDir").Text
m_TidyBatchName = g_oDOM.selectSingleNode("//BatchFileName").Text
m_TidyInputFileName = g_oDOM.selectSingleNode("//InputFileName").Text
The code attempts to load the XML, and if successful, tries to parse for the configuration data. If found, the text is placed into private class members, and CreateTidyBatch is called.
If (Len(m_TidyDir) > 0) And
_(Len(m_TidyBatchName) > 0) And _
(Len(m_TidyInputFileName) > 0) Then
' Create the bacth file
If CreateTidyBatch(m_TidyDir, m_TidyBatchName, _
m_TidyInputFileName) Then
ITidy_Configure = True
m_Configured = True
m_sConfigXML = sConfigXml
Else
ITidy_Configure = False
m_Configured = False
End If
Else
ITidy_Configure = False
m_Configured = False
End If
Else
ITidy_Configure = False
End If
End Function
If there are any problems getting the configuration data, the method returns False.
ITidy_HtmlToXML is used to pass in some HTML and get back the cleaned-up version.
'**************************************************************
' Private Function ITidy_HtmlToXML(ByVal sHTML As String)
' As String
' Takes HTML and cleans it up into XML.
' Need to take the HTML, write it to a temp file, and pass the
' temp file name, the name of an output temp file, and tidy
' params to ExecCmd(). Then read in the output file and
' return it.
'**************************************************************
Private Function ITidy_HtmlToXML(ByVal sHTML As String)
_
As String
Dim oFso As FileSystemObject
Dim oTso As TextStream
Set oFso = New FileSystemObject
A FileSystemObject declared and initialized, and the code checks that the tidy directory exists. If it does, the code then writes the HTML to a file in that directory:
If
oFso.FolderExists(m_TidyDir) Then
Set oTso = oFso.CreateTextFile(m_TidyDir & "/" & _
m_TidyInputFileName, True, False)
oTso.Write sHTML
oTso.Close
Else
ITidy_HtmlToXML = "<ERROR/>"
Exit Function
End If
The tidy batch file is then called:
g_oUtils.ExecCmd m_sPathToTidyBAT
The results are then read in to a local variable, and returned:
Set oTso = oFso.OpenTextFile(m_TidyDir & "/" & _
m_TidyInputFileName, ForReading, False)
sHTML = oTso.ReadAll()
ITidy_HtmlToXML = sHTML
End Function
The usual class events are used for preparing some objects and cleaning up:
Private Sub Class_Initialize()
Set g_oDOM = New DOMDocument30
Set g_oUtils = New MinorUtils
m_Configured = False
m_sConfigXML = ""
End Sub
Private Sub Class_Terminate()
Set g_oDOM = Nothing
Set g_oUtils = Nothing
End Sub
Converting to WML
So, we now have some objects for retrieving an HTML page and for cleaning up the HTML. We can now use these in an MTS object, which will be called from the ASP code. I created an ActiveX DLL project called Html2Wml, with a single class named Conversion. It uses references to MSXML version 3, the ISimpleHTTP.IRequest interface class, the HttpUtils.ITidy interface class, the Microsoft Transaction Server type library, and the Scripting runtime library.
'**************************************************************
' Class Html2Wml.Conversion
' Exposes methods for fetching a web page and tranforming
' the HTML into WML.
'
' Copyright (C) 2000 James Britt
' james@logicmilestone.com '
' This program is free software; you can redistribute it and/or
' modify it under the terms of the GNU General Public License
' as published by the Free Software Foundation; either version
' 2 of the License, or (at your option) any later version.
'
' This program is distributed in the hope that it will be useful
'**************************************************************
Option Explicit
Implements ObjectControl
Private g_oXmlDom As DOMDocument30
Private g_oXslDom As DOMDocument30
Private g_oHTTP As ISimpleHTTP.IRequest
Private g_oTidy As IHttpUtils.ITidy
Private m_sXmlConfig As String
Since this class uses MTS, it needs to implement ObjectControl (see the references at the end of the article for links to more information about creating objects for MTS). The code also declares some private global objects to be used in the code. Note that references are set to the interface classes defined earlier. This allows the code to use early-binding, since it has an explicit type library available. The code will set these objects to specific classes based on the information in the configuration XML. When better implementations of these objects are built, they can be swapped I without having to change any code in this class, provided they implement the defined interfaces.
Configuration is done by passing an XML string to the Configure method:
My code disables validation when loading the cleaned up HTML, because I was seeing problems resolving the DTD. However, this meant that various entity references that are common in HTML (such as ) would cause much grief for the parser. I decided to do some additional processing on the HTML by replacing these entities with character references. The search-and-replace strings are stored in the child elements of the Replacement element. CDATA tags are used to keep the parser from trying to resolve the contents:
'**************************************************************
' Public Function Configure(sXML As String) As Boolean
' Receives an XML for configuring objects used by this class.
'**************************************************************
Public Function Configure(ByVal sXML As String) As Boolean
Dim oEL As IXMLDOMElement
Dim sTidyClass As String
Dim sSimpleHttpClass As String
Local variables are declared for the object class Ids, and the configuration XML is given to the XML parser. If the XML is loaded, the code tries to pull out the names of the objects to use:
If g_oXmlDom.loadXML(sXML) Then
If (g_oXmlDom.getElementsByTagName("TidyClass").length > 0) Then
Set oEL = g_oXmlDom.getElementsByTagName("TidyClass").Item(0)
sTidyClass = oEL.Text
If the class ID for the tidy wrapper is found, then the code tries to create an instance of it, and then configure it by handing off the configuration XML:
Set g_oTidy =
CreateObject(sTidyClass)
If Not g_oTidy Is Nothing Then
Configure =
g_oTidy.Configure(sXML)
m_sXmlConfig = sXML
Else
Err.Raise -1, "Html2Wml.Conversion.Configure", _
"Failed to create " & sTidyClass
Configure = False
Exit Function
End If
Else
Err.Raise -1, "Html2Wml.Conversion.Configure", _
"Failed to find TidyClass element"
Configure = False
Exit Function
End If
Likewise, once the Inet wrapper class is found, it too is created and configured:
If (g_oXmlDom.getElementsByTagName("SimpleHTTPClass")
.length > 0) Then
Set oEL = g_oXmlDom.getElementsByTagName("SimpleHTTPClass").Item(0)
sSimpleHttpClass = oEL.Text
Set g_oHTTP = CreateObject(sSimpleHttpClass)
If Not g_oHTTP Is Nothing Then
If Not
g_oHTTP.Configure(sXML) Then
Err.Raise -1, "Html2Wml.Conversion.Configure", _
"Failed to config g_oHTTP"
Configure = False
Exit Function
Else
Configure = True
End If
Else
Err.Raise -1, "Html2Wml.Conversion.Configure", _
"Failed to create " & sSimpleHttpClass
Configure = False
Exit Function
End If
Otherwise, an error is raised:
Else
Err.Raise -1, "Html2Wml.Conversion.Configure", _
"Failed to find SimpleHTTPClass element."
Configure = False
Exit Function
End If
If the XL cannot be loaded, then an error is raised:
Else
Err.Raise -1, "Html2Wml.Conversion.Configure", _
"Failed to parse config XML: " & _
g_oXmlDom.parseError.reason
Configure = False
End If
End Function
To handle the retrieval and conversion of the requested URL, FetchAndConvert takes the URL and uses the other objects to fetch it and convert it.
'**************************************************************
' Public Function FetchAndConvert(ByVal sURI As String, _
' ByVal sXSL As String) As String
' Retrives an HTML file using the given URI and converts it
' to WML, returning the WML string
'**************************************************************
Public Function FetchAndConvert(ByVal sURI As String, _
ByVal sXSL As String) As String
On Error GoTo ErrHand
Dim sHTML As String
Dim sXML As String
Dim sWML As String
The code calls into the Inet wrapper class to get the HTML, then passes it on the tidy wrapper:
' now transform ...
g_oXmlDom.validateOnParse = False
g_oXmlDom.resolveExternals = False
Potentially bothersome characters are replaced by calling ReplaceStrings, a private method of the
class:
sXML = ReplaceStrings(sXML,
m_sXmlConfig)
' The error info should perhaps be WML, unless the calling app
' will handle that part
If Not g_oXmlDom.loadXML(sXML) Then
FetchAndConvert = "<ERROR location='FetchAndConvert' type='XML parse'>" & _
"<Reason><domParseError>" & g_oXmlDom.parseError.reason & _
"<domParseError>" & "<filePos>" & _
g_oXmlDom.parseError.filepos & _
"</filePos></Reason><Source>" & sXML & "</Source></ERROR>"
Exit Function
End If
If the XML cannot be loaded into the parser, then error information is returned. Otherwise, the XSL is loaded:
If Not g_oXslDom.loadXML(sXSL) Then
FetchAndConvert = "<ERROR location='FetchAndConvert' " & _
" type='XSL parse'>"<Reason>" &
_
g_oXslDom.parseError.reason & _
"</Reason><Source>" & sXSL & "</Source></ERROR>"
Exit Function
End If
If the XSL was correctly loaded, it is used to transform the HTML into WML:
sWML =
g_oXmlDom.transformNode(g_oXslDom)
ErrHand:
If Err.Number <> 0 Then
sWML = "<ERROR type='COM'><Reason>" & Err.Description & _
"</Reason></ERROR>"
End If
If any COM errors occurred, the code returns the error information. Otherwise, it returns the converted web page.
FetchAndConvert = sWML
End Function
Replacing text in the HTML is a matter of pulling the search and replace text from the configuration XML and running through the document using the VB Replace function:
'**************************************************************
' Private Function ReplaceStrings(sHTML As String) As String
' Goes through the converted document and replaces text,
' defined in the config XML
'**************************************************************
Private Function ReplaceStrings(sHTML As String, sXMLConfig) As String
Dim oNodelist As IXMLDOMNodeList
Dim oEL As IXMLDOMElement
Dim nIdx As Integer
Dim sFind As String
Dim sReplace As String
If g_oXmlDom.loadXML(sXMLConfig) Then
Set oNodelist = g_oXmlDom.getElementsByTagName("Replacement")
If oNodelist.length > 0 Then
For nIdx = 0 To oNodelist.length - 1
Set oEL = oNodelist.Item(nIdx)
sFind = Trim(oEL.getElementsByTagName("Find").Item(0).childNodes(0).Text)
sReplace = Trim(oEL.getElementsByTagName("Replace").Item(0).childNodes(0).Text)
sHTML = Replace(sHTML, sFind,
sReplace)
Next
End If
End If
ReplaceStrings = sHTML
End Function
ObjectControl events are used to prepare some objects and to clean up:
Private Sub ObjectControl_Activate()
Set g_oXmlDom = New DOMDocument30
Set g_oXslDom = New DOMDocument30
End Sub
Private Function ObjectControl_CanBePooled() As Boolean
ObjectControl_CanBePooled = False
End Function
Private Sub ObjectControl_Deactivate()
Set g_oHTTP = Nothing
Set g_oTidy = Nothing
Set g_oXmlDom = Nothing
Set g_oXslDom = Nothing
End Sub
The XSLT Part
Now, if we revisit fetch.asp, we see that it creates an instance of Html2Wml.Conversion, configures it with the XML configuration data, and calls FetchAndConvert with the requested URL and the HTML-to-WML XSL. The XSL (or, more precisely, the XSLT) is admittedly sparse:
It by no means manages to convert all HTML into proper WML; it does manage to create wml and card elements, as well as sticking in the occasional p element. Further, it produces the correct DOCTYPE for a WML XML document. However, since the XSL is loaded form a file on the web server, changing it to something more complete is not an issue, and would be interested in seeing a better implementation.
Summary
Weve seen an IIS/ASP/VB implementation of a site that allows a WAP device user to request any HTML page on the Internet and have it sent back as WML. There are a number of problems with the specific code: The object used to fetch web pages from the server may not work under all circumstance; the clean-up of HTML is done using a single-threaded command-line executable; the XSLT for transforming the HTML to WML is incomplete. However, these problems are isolated in dynamically loaded objects, so further enhancement can go on with any disruption to the main code. I would suggest that, if anybody finds this code useful, they take the time to create a more robust version making heavy use of it. This would include reconsidering the interface classes and the methods they define. Expanding the code is much easier if it permits polymorphism; off hand, I can think of a few more methods that the ISimpleHTTP.IRequest interface could define, such as a method for setting POST data, or methods for adding additional headers.