BizTalk Utilities CV ,   Jobs ,   Code library  
 
Home Page
Uncategorized
Introduction to the Wireless Application Protocol
Kurt Cagle's 60 Webtricks
The Rise of the Cooperative Economy
WAP - Part II - A Descent Through the WAP Protocol Stack.
Get started in XML now!
WAPReetings - A WAP Greeting card example
XML, releasing the power of the future?
XML for Not Yet Techies
XML Structures for Existing Databases
Riding the Media Wave
XML and Server-Side Processing
XML Interoperability
Connecting E-commerce Systems with XML
Bluetooth: A Programmer's Primer
Building Dynamic WAP Applications with ColdFusion
Creating a Dynamic WAP Application
WAP Subscriber Identity Modules
The Future of WAP: v1.2 and Beyond
Learning the Lessons of WAP
WAP Overview
<< System.XML
WCF, WS, SOAP >>

By :Mark Wilson
I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :04/18/2000
Times viewed :397

 

On-request Conversion of HTML to WML

James Britt is an Internet application developer in Phoenix, Arizona. He is the co-author (with Teun Duynstee) of Professional Visual Basic 6 XML (from Wrox Press).  He has been a long standing member of TopXML

by James Britt

What is WAP?

WAP is the Wireless Application Protocol. It is based on familiar Internet technologies, and was developed to deliver web content and services to wireless clients, such as mobile phones. Since, at present, such devices have severe memory, power, and bandwidth restrictions, web pages are formatted using WML (the Wireless Markup Language). It is an application of XML that is much like a restricted version of HTML. Further, there are special WAP servers that compile WML into byte code before sending the page to the device, thereby reducing the page size. However, WML can still be served "straight up" from any web server.

Introduction to converting HTML to WML

Recently (meaning around March of 2000), the TopXML mailing list had a discussion thread about WML, or the Wireless Markup Language. Someone asked about developing for WML, and I replied that I had done a little playing around with this. Many months prior I had put together a test web page that would take a URL as a query string parameter, fetch the document, convert it to WML, and return the converted document with a WML MIME type. The idea was to allow a WAP-phone user to be able to view any HTML page on the Internet, not just those specifically formatted in WML. For example, if you wanted to see the “XML Hack” home page (www.xmlhack.com), you would, from your WAP phone, go to a WML page that presented a simple form. The desired URL would be entered, and the form submitted. The web server would receive the submission, fetch the requested page, convert it to WML, and send back the results.

The owners of the TopXML web site asked me if I would consider writing an article about this, and I agreed. The only problem, though, was the code I wrote was in Perl, and ran on an Apache web server. Further, it did not use any XSL. The transformation from HTML to WML was done using string replacement and regular expressions; at the time, there was no decent XSLT Perl module. Still, the concepts would apply to any platform, and I though that building a similar app for IIS using VB would be interesting.

The problem breaks down into a set of tasks(steps):

  1. Provide a WML page where a user can enter a URL into a form field
  2. Receive the web page form request
  3. Pull out the desired URL from the query string
  4. Fetch the desired page
  5. Convert it to well-formed XML (e.g. XHTML)
  6. Transform the XML into WML
  7. Write the new document back to the WAP device

This article will go through these steps, providing code for each task; there are references at the end for more information on some of the more complex areas.

I should point out right here that the implementation, while functional, has some flaws.

Steps 1,2 and 3 are quite simple. However, step 4 presents a problem: Microsoft does not provide a nice component for fetching web pages from a web server. Yes, there are objects like the Internet Transfer Control, but these use URLMON or WININET components under the hood. Microsoft has a Knowledge Base article explaining that these objects were not designed to be used from a server, and that such calls should be done using Winsock. (You may have run across this if you’ve investigated using the XMLHttpRequest object to fetch web pages from an inside an Active Server Page). What’s odd is that there are also KB articles [here, and here] explaining how to use the Internet Transfer Control from of ASP. Further, the specific problems are ill-defined, and I’ve had personal success using URLMON/WININET based objects on the server.

To prepare this article, I tried some third-party controls, but wasn’t terribly happy with them. So, I’ve taken the easy route and written a component that uses the Internet Transfer control. It works just fine on Windows 98 and Windows 2000 Professional, though it may give you a problem on an NT Server; I haven’t tested it. The code is wrapped in a stand-alone object, and I use interface-based code to load and call the object, so changing it should be no problem if you decide to use something else. I may go back and write a proper ActiveX DLL using Winsock, but that’s a project for another day.

Step 5 is also a problem. Turning arbitrary HTML into well-formed XML is not a trivial task. I’m aware of only one tool that does this reasonably well: tidy.exe, available from the W3.org site. Unfortunately, it’s a command-line executable. I used the Linux version of tidy for my Apache/Perl experiment, and had to use a system call and temporary disk files to convert the HTML. The code here will do the same thing. Yes, this is very ugly; I’ve put this functionality into a separate object, so that if a better tool comes along it can be swapped in without breaking too much code.

Step 6 is straightforward in principle, but in practice writing XSLT to transform arbitrary HTML requires some decent knowledge of the WML spec. I must confess that I am not a big fan of WML. My gut feeling is that, given the rate technology advances, the display capabilities of wireless devices will rapidly approach that of today’s hand-held or laptop computers. I believe that most users would opt for viewing web pages “as is” (or close to it), rather than seeing a much sparser WML version. (My prediction: wireless devices will understand XHTML, and WML will go away.) So, be warned that the WML code presented is simplistic at best; my hope is that this code will be of interest to other TopXML readers, who will take the basic framework and expand it.

Finally, step 7 is simple once we have the content we want to write back to the user.

Submitting a Request for a Web Page

The web site for all this is ASP code running on IIS. I’ve been using IIS 5, though I don’t see why it shouldn’t work the same on IIS 4. The site has two web pages, along with a global.asa file, a configuration file, and an XSL file.

Global.asa looks like this:

<SCRIPT LANGUAGE=VBScript RUNAT=Server>

Option Explicit Sub LoadConfigData(sServerPath, sVarName)

Dim sFname 
Dim fso Dim tso

sFname = Server.MapPath(sServerPath) 
Set fso = Server.CreateObject("Scripting.FileSystemObject") 
if fso.FileExists(sFname) then 
  Set tso = fso.OpenTextFile(sFname, 1, False) 
  Application(sVarName) = tso.ReadAll() 

Else 
  Application(sVarName) = "ERROR" 
End if

Set fso = Nothing Set tso = Nothing 

End Sub

Sub Application_OnStart

  LoadConfigData "Config/Html2WmlConfig.xml", "ConfigXML" 
  LoadConfigData "Style/html2wml.xsl", "Html2WmlXSL" 

End Sub

</SCRIPT>

When the application starts, Application variables are set to hold the configuration data (used to set up the VB objects used) and the XSLT for transforming the HTML into WML. LoadConfigData simply opens a file and reads it into an Application variable. The configuration data is an XML file kept in a file called Html2WmlConfig.xml, in the Config directory; the XSL is in a file called html2wml.xsl, kept in the Style directory. I’ll discuss these when we get to the code that uses them.

The site also has a default.asp page:


<<%@ Language=VBScript %><%

  Response.ContentType = "text/vnd.wap.wml"

%>

The page starts off by declaring a WML MIME type, "text/vnd.wap.wml", which is required for WAP devices to recognize the page. I’ve found it helpful to omit this while debugging so that I could simply call the page through a regular browser. It then provides the actual page content:

<?xml version="1.0"?>
  <!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.2//EN" "http://www.wapforum.org/DTD/wml_1.2.xml">

<wml>
  <card id="card1" title="Convert">
  <do type="accept" label="Submit">
  <go href="http://www.SomeServer.com/wml/fetch.asp?url=$(url)"/>
</do>
<p>
  URL: <input name="url" value=""/><br/>
</p>
</card>
</wml>

This should render a WML page with a simple form containing a field where the user can enter a URL. When the form is submitted, it will send a request to fetch.asp, passing the desired URL on the query string.

Here’s fetch.asp:

<%@ Language=VBScript%>

<%
Response.ContentType = "text/vnd.wap.wml"

<%
Response.ContentType = "text/vnd.wap.wml"

The page begins by setting the MIME type. It then gets a reference to an MTS object called Html2Wml.Conversion. This class will use another object to request the desired web page, and yet another object to convert that page to well-formed XML.

Dim oCvrt

  Set oCvrt = Server.CreateObject("Html2Wml.Conversion")

The code also declares a few more variables,

Dim sURL
Dim sXML, sWML
Dim sXSL, sConfig

 sURL = Request.Item("url")
 sXML = Application("ConfigXML")
  sXSL = Application("Html2WmlXSL")

 

The code checks that the object was instantiated correctly, and if so, begins to emit WML. Otherwise, it will send back an error message:

if (oCvrt Is Nothing) then
  Response.Write "<?xml version='1.0'?>
  Response.Write "<!DOCTYPE wml PUBLIC '-//WAPFORUM//
   DTD WML 1.2//EN' "
  Response.Write " 'http://www.wapforum.org/DTD/wml_1.2.xml'>
  Response.Write "<wml><card id='card2' title='Convert error'>"
  Response.Write "<p>Server error.</p>"
  Response.Write "</card></wml>"
Else

The code also checks that it was able to configure the object before using it:

If oCvrt.Configure(sXML) Then
  sWML = oCvrt.FetchAndConvert(sURL, sXSL)
   Response.Write sWML
Else
   Response.Write "<?xml version='1.0'?>
  Response.Write "<!DOCTYPE wml PUBLIC '-//WAPFORUM//DTD WML 1.2//EN' "
  Response.Write " 'http://www.wapforum.org/DTD/wml_1.2.xml'>
  Response.Write "<wml><card id='card2' title='Convert error'>"
  Response.Write "<p>Conversion error.</p>"
  Response.Write "</card></wml>"
End If
%>

Making HTTP Requests from the Server

The code for Html2Wml.Conversion is not too complex, but it uses two other classes, so I’ll describe those first. First, to make a request for a web page, we need an object that can perform an HTTP GET. I decided to knock out a basic ActiveX DLL that wraps the Inet control shipped with VB 6. However, I need to make a slight digression here. Because my DLL is less than ideal, I decided that any code that uses it should really be written against an interface class. This would allow the code to use early binding during compilation and execution, but the actual object used could be decided at run-time. So, I’ve defined another VB class that serves to provide a type library for the actually class used. The project is an ActiveX DLL called ISimpleHTTP, with a single class called IRequest:

'**************************************************************
' ISimpleHTTP.IRequest
' Interface class for fetching web pages
'**************************************************************

Option Explicit

Public Function Configure(sXML As String) As Boolean

End Function

Public Function OpenURL(strURL As String) As String

End Function

It does nothing more than define method signatures. The first method is a generic configuration function that would be used to set any parameters the actual code would use. The second method just takes a String specifying a URL, and returns a string with the retrieved HTML.

Compiling the class will give us a type library that we can use in other projects. I then created the real code for this in an ActiveX DLL project called InetWrapper. It contains a single class (called HTTP), and a form (called frmMain). The project uses a reference to the Inet control, which is placed on the form. The control is then manipulated from the class by getting a reference to the form. The code also requires a reference to the Microsoft XML parser; I’ve used version 3.

'**************************************************************
' Class InetWrapper.HTTP
' Wraps the INet control in an ActiveX DLL
' Copyright (C) 2000 James Britt
' james@logicmilestone.com
'
' This program is free software; you can redistribute it and/or
' modify it under the terms of the GNU General Public License
' as published by the Free Software Foundation; either version 2
' of the License, or (at your option) any later version.
'**************************************************************

Option Explicit

Implements ISimpleHTTP.IRequest

Private objForm As Object

Option Explicit

Implements ISimpleHTTP.IRequest

Private objForm As Object

The class has a reference to the ISimpleHTTP.IRequest type library just created. The Implements keyword is used to indicate that this class will implement the methods defined by ISimpleHTTP.IRequest. IRequest_Configure takes an XML string that specifies how many seconds the Inet control should wait before a web page request has a time-out:

'**************************************************************
' Private Function IRequest_Configure(sXML As String) As String
' Set parameters on the Inet object
' <InetConfig>
' <RequestTimeOut>300</RequestTimeOut>
' <InetConfig>
'**************************************************************

Private Function IRequest_Configure(sXML As String) As Boolean

  Dim oDOM As DOMDocument30
  Dim oEl As IXMLDOMElement

  Set oDOM = New DOMDocument30

  If oDOM.loadXML(sXML) Then
    Set oEl = oDOM.getElementsByTagName("RequestTimeOut").Item(0)
    If Not oEl Is Nothing Then
      objForm.Inet1.RequestTimeout = oEl.Text
            IRequest_Configure = True
            Set oDOM = Nothing
       Else
         Err.Raise -1, "InetWrapper.HTTP.Configure", _
         "Can not find RequestTimeOut."
         IRequest_Configure = False
      End If
    Else
      Err.Raise -1, "InetWrapper.HTTP.Configure", _
      "Can not parse XML: " & oDOM.parseError.reason
      IRequest_Configure = False
    End If
End Function

The code just sets a reference to the DOM object, loads the XML, and tries to pull out the RequestTimeOut value.

IRequest_OpenURL takes a URL, and uses the Inet control to retrieve it, returning the HTML:

'**************************************************************
' Private Function IRequest_OpenURL(strURL As String) As String
' Go get a web page and return it.
'**************************************************************

Private Function IRequest_OpenURL(strURL As String) As String

Dim sHTML As String

  sHTML = objForm.Inet1.OpenURL(strURL)
  Debug.Print sHTML
  IRequest_OpenURL = sHTML

End Function

The class also uses the Initialize and Terminate events to get a reference to the form and to clean up:

Private Sub Class_Initialize()

  Set objForm = New frmMain

End Sub

Private Sub Class_Terminate()

  Set objForm = Nothing

End Sub

Cleaning Up HTML

Once we’ve got some HTML, we need to clean it up before we try to transform it to WML. I use the tidy utility (available from www.w3.org) to do this. This is available for several platforms; the Windows version is a command-line program. To use this from VB, I needed to have the code execute the program with some parameters, then read in the resulting file. I had originally used the Shell function, but found it troublesome to track when the shell process had finished. Poking around on the MSDN site (msdn.microsoft.com) I found some example code that used a few Win32 API calls to do this. Once again, because this is a sub-optimal solution, I created an interface class so that swapping out the actual tidy code would be easy. (I present it as a challenge to the reader to write an in-memory DLL that does this. The C source code for tidy is also available from w3.org.)

The interface class is called IHttpUtils.ITidy. It is an ActiveX DLL project with a single class

'**************************************************************
' IHttpUtils.ITidy
' Interface class for cleaning up HTML
'**************************************************************

Option Explicit

Public Function HtmlToXML(ByVal sHTML As String) As String
End Function

Public Function Configure(ByVal sConfigXml As String) As Boolean
End Function

As with the previous interface class, this just defines method signature, which will be implemented in another class called HTMLUtils.Convert. This is part of an ActiveX DLL project. It requires references to the Scripting runtime library, the Microsoft XML parser (version 3), and the interface class just defined.

The project has two classes and a module. The module is called Declares.bas, and contains Win32 API declarations:

' Code taken from MSDN example
Option Explicit

Public Declare Function GetModuleUsage% Lib "Kernel" _
  (ByVal hModule As Long)

Public Declare Function GetModuleHandle Lib "kernel32" _
  Alias "GetModuleHandleA" _
  (ByVal lpModuleName As String) As Long

Public Declare Function FindWindow Lib "user32" _
  Alias "FindWindowA" _
  (ByVal lpClassName As String, _
  ByVal lpWindowName As String) As Long

Public Declare Function IsWindow Lib "user32" _
  (ByVal hwnd As Long) As Long

'Constants used by the API functions
Public Const WM_CLOSE = &H10
Public Const INFINITE = &HFFFFFFFF
Public Type STARTUPINFOcb As Long
  lpReserved As String
  lpDesktop As String
  lpTitle As String
  dwX As Long
  dwY As Long
  dwXSize As Long
  dwYSize As Long
  dwXCountChars As Long
  dwYCountChars As Long
  dwFillAttribute As Long
  dwFlags As Long
  wShowWindow As Integer
  cbReserved2 As Integer
  lpReserved2 As Long
  hStdInput As Long
  hStdOutput As Long
  hStdError As Long
End Type

Public Type PROCESS_INFORMATION
  hProcess As Long
  hThread As Long
  dwProcessID As Long
  dwThreadID As Long
End Type

Public Declare Function WaitForSingleObject Lib "kernel32" _ 
  (ByVal hHandle As Long, ByVal dwMilliseconds As Long) _
    As Long

Public Declare Function CreateProcessA Lib "kernel32" _
   (ByVal lpApplicationName As Long, _
  ByVal lpCommandLine As String,
  ByVal
 lpProcessAttributes As Long, 
  ByVal lpThreadAttributes As Long, _
 
  ByVal bInheritHandles As Long, _
  ByVal dwCreationFlags As Long, _
   ByVal lpEnvironment As Long, _
  ByVal lpCurrentDirectory As Long, _
   lpStartupInfo As STARTUPINFO, lpProcessInformation As _
   PROCESS_INFORMATION) As Long

Public Declare Function CloseHandle Lib "kernel32" _
  (ByVal hObject As Long) As Long

Public Declare Function GetExitCodeProcess Lib "kernel32" _
  (ByVal hProcess As Long, lpExitCode As Long) As Long

Public Const NORMAL_PRIORITY_CLASS = &H20&

A class called MinorUtils is used to handle the system call for running tidy.

'**************************************************************
' HTMLUtils.MinorUtils
' Some code from the MSDN web site for
' running shell commands and checking that the called process
' has finised
'**************************************************************

Option Explicit

There is a function for modifying the return code form a Shell call

'**************************************************************
' Private Function FixValue(ByVal lVal As Long) As Integer
' This function is necessary since the value returned by Shell
' is an unsigned integer and may exceed the limits 
' of a VB integer.
' Taken from MSDN.
'**************************************************************

Private Function FixValue(ByVal lVal As Long) As Integer

  If (lVal And &H8000&) = 0 Then
    FixValue = lVal And &HFFFF&
  Else
    FixValue = &H8000 Or (lVal And &H7FFF&)
  End If

End Function

There is another function for actually executing a shell command and waiting for the process to finish before returning:

'**************************************************************
' Private Function ExecCmd(cmdline$)
' Execucutes a command-line call and waits until the job is done
' before returning.
' Taken from MSDN site.
'**************************************************************

Friend Function ExecCmd(cmdline$)

  Dim proc As PROCESS_INFORMATION
  Dim start As STARTUPINFO
  Dim ret&

  ' Initialize the STARTUPINFO structure:
  start.cb = Len(start)

  ' Start the shelled application:
  ret& = CreateProcessA(0&, cmdline$, 0&, 0&, 1&, _
  NORMAL_PRIORITY_CLASS, 0&, 0&, start, proc)

  ' Wait for the shelled application to finish:
  ret& = WaitForSingleObject(proc.hProcess, INFINITE)
  Call GetExitCodeProcess(proc.hProcess, ret&)
  Call CloseHandle(proc.hThread)
  Call CloseHandle(proc.hProcess)
  ExecCmd = ret&
End Function

Private Sub Class_Initialize()
End Sub

The main class, Convert, implements the IHttpUtils.ITidy interface:

'**************************************************************
' Class HTMLUtils.Convert
' Implements IHttpUtils.ITidy, whoch defines methods
' for class configuration and cleaning up HTML into
' well-formed XML'
' Copyright (C) 2000 James Britt
' james@logicmilestone.com
'
' This program is free software; you can redistribute it and/or
' modify it under the terms of the GNU General Public License
' as published by the Free Software Foundation; either version 2
' of the License, or (at your option) any later version.
'**************************************************************

Option Explicit

Implements IHttpUtils.ITidy

Private g_oDOM As DOMDocument30
Private g_oUtils As MinorUtils
Private m_Configured As Boolean
Private m_sPathToTidyBAT As String
Private m_TidyDir As String
Private m_TidyBatchName As String
Private m_TidyInputFileName As String
Private m_sConfigXML As String

Option Explicit

Implements IHttpUtils.ITidy

Private g_oDOM As DOMDocument30
Private g_oUtils As MinorUtils
Private m_Configured As Boolean
Private m_sPathToTidyBAT As String
Private m_TidyDir As String
Private m_TidyBatchName As String
Private m_TidyInputFileName As String
Private m_sConfigXML As String

To run tidy with the correct arguments, a batch file is created using CreateTidyBatch. This takes strings specifying the directory where tidy.exe lives, the name of the batch file to be created, and the name of the input file to be processed:

'**************************************************************
' Private Function CreateTidyBatch(ByVal sTidyDir As String,
' ByVal sFileName As String,
' ByVal sInputFile As String) As Boolean
' Creates a batch file for calling tidy.exe, which cleans up
' the raw HTML
'**************************************************************

Private Function CreateTidyBatch(ByVal sTidyDir As String, _
  ByVal sFileName As String, _
  ByVal sInputFile As String) As Boolean

Dim oFso As Scripting.FileSystemObject
Dim oTso As Scripting.TextStream

  Set oFso = New FileSystemObject

  If (m_TidyDir = "") Then
    CreateTidyBatch = False
  Exit Function
  End If

  If oFso.FolderExists(m_TidyDir) Then
    Set oTso = oFso.CreateTextFile(sTidyDir & "\" & _
      sFileName & ".bat", True, False)
    oTso.Write "cd " & sTidyDir & vbCrLf
    oTso.Write "tidy.exe -f tidy.err -asxml -ascii -im " & sInputFile
    oTso.Close
    m_sPathToTidyBAT = sTidyDir & "\" & sFileName & ".bat"
    CreateTidyBatch = True
    m_TidyBatchName = sFileName
  Else
    CreateTidyBatch = False
  End If

End Function

(For a list of available tidy arguments, and what they mean, run tidy with the “-help” argument.)

To configure the class, an XML string is passed to ITidy_Configure, specifying the path to the tidy directory, the name of batch file to create for running tidy, and the name of the input file tidy will parse. Note that these elements may be part of a larger XML document; no validation is done on the document, which allow us to use a single configuration file (stored in the ASP application) for multiple objects.

'**************************************************************
' Private Function ITidy_Configure(ByVal sConfigXml As String) As Boolean
' Takes an XML string that defines paramters for the conversion tool.
' We happen to be using tidy.exe, which needs some file path and file name info.
' <WMLConvertConfig>
' <TidyDir></<TidyDir>
' <BatchFileName></BatchFileName>
' <InputFileName></InputFileName>
' </WMLConvertConfig>
'**************************************************************

Private Function ITidy_Configure(ByVal sConfigXml As String) As Boolean

  If g_oDOM.loadXML(sConfigXml) Then
    m_TidyDir = g_oDOM.selectSingleNode("//TidyDir").Text
    m_TidyBatchName = g_oDOM.selectSingleNode("//BatchFileName").Text
    m_TidyInputFileName = g_oDOM.selectSingleNode("//InputFileName").Text

The code attempts to load the XML, and if successful, tries to parse for the configuration data. If found, the text is placed into private class members, and CreateTidyBatch is called.

 

    If (Len(m_TidyDir) > 0) And _(Len(m_TidyBatchName) > 0) And _
      (Len(m_TidyInputFileName) > 0) Then
      ' Create the bacth file
      If CreateTidyBatch(m_TidyDir, m_TidyBatchName, _
        m_TidyInputFileName) Then
        ITidy_Configure = True
        m_Configured = True
        m_sConfigXML = sConfigXml
      Else
        ITidy_Configure = False
        m_Configured = False
      End If
    Else
      ITidy_Configure = False
      m_Configured = False
    End If
  Else
    ITidy_Configure = False
  End If
End Function

If there are any problems getting the configuration data, the method returns False.

ITidy_HtmlToXML is used to pass in some HTML and get back the cleaned-up version.

'**************************************************************
' Private Function ITidy_HtmlToXML(ByVal sHTML As String) 
'   As String
' Takes HTML and cleans it up into XML.
' Need to take the HTML, write it to a temp file, and pass the
' temp file name, the name of an output temp file, and tidy 
' params to ExecCmd(). Then read in the output file and 
' return it.
'**************************************************************

Private Function ITidy_HtmlToXML(ByVal sHTML As String) _
 As String

  Dim oFso As FileSystemObject
  Dim oTso As TextStream

  Set oFso = New FileSystemObject

A FileSystemObject declared and initialized, and the code checks that the tidy directory exists. If it does, the code then writes the HTML to a file in that directory:

  If oFso.FolderExists(m_TidyDir) Then
    Set oTso = oFso.CreateTextFile(m_TidyDir & "/" & _
      m_TidyInputFileName, True, False)
    oTso.Write sHTML
    oTso.Close
  Else
    ITidy_HtmlToXML = "<ERROR/>"
    Exit Function
  End If

The tidy batch file is then called:

  g_oUtils.ExecCmd m_sPathToTidyBAT

The results are then read in to a local variable, and returned:

 

  Set oTso = oFso.OpenTextFile(m_TidyDir & "/" & _
    m_TidyInputFileName, ForReading, False)
  sHTML = oTso.ReadAll()
  ITidy_HtmlToXML = sHTML
End Function

The usual class events are used for preparing some objects and cleaning up:

Private Sub Class_Initialize()
  Set g_oDOM = New DOMDocument30
  Set g_oUtils = New MinorUtils
  m_Configured = False
  m_sConfigXML = ""
End Sub

Private Sub Class_Terminate()
  Set g_oDOM = Nothing
  Set g_oUtils = Nothing
End Sub

Converting to WML

So, we now have some objects for retrieving an HTML page and for cleaning up the HTML. We can now use these in an MTS object, which will be called from the ASP code. I created an ActiveX DLL project called Html2Wml, with a single class named Conversion. It uses references to MSXML version 3, the ISimpleHTTP.IRequest interface class, the HttpUtils.ITidy interface class, the Microsoft Transaction Server type library, and the Scripting runtime library.

'**************************************************************
' Class Html2Wml.Conversion
' Exposes methods for fetching a web page and tranforming
' the HTML into WML.
'
' Copyright (C) 2000 James Britt
' james@logicmilestone.com
'
' This program is free software; you can redistribute it and/or
' modify it under the terms of the GNU General Public License
' as published by the Free Software Foundation; either version 
' 2 of the License, or (at your option) any later version.
'
' This program is distributed in the hope that it will be useful
'**************************************************************

Option Explicit
Implements ObjectControl

Private g_oXmlDom As DOMDocument30
Private g_oXslDom As DOMDocument30
Private g_oHTTP As ISimpleHTTP.IRequest
Private g_oTidy As IHttpUtils.ITidy
Private m_sXmlConfig As String

Since this class uses MTS, it needs to implement ObjectControl (see the references at the end of the article for links to more information about creating objects for MTS). The code also declares some private global objects to be used in the code. Note that references are set to the interface classes defined earlier. This allows the code to use early-binding, since it has an explicit type library available. The code will set these objects to specific classes based on the information in the configuration XML. When better implementations of these objects are built, they can be swapped I without having to change any code in this class, provided they implement the defined interfaces.

Configuration is done by passing an XML string to the Configure method:

The XML would like this:

<WMLConvertConfig>
  <TidyDir>c:\inetpub\wwwroot\cgi-bin</TidyDir>
  <TidyClass>HTMLUtils.Tidy</TidyClass>

These two elements indicate the directory where tidy.exe resides, and the ClassID of the object that implements IHttpUtils.ITidy.

We also need to specify the ClassID of the object implementing ISimpleHTTP.IRequest, which will retrieve the web page for us:

<SimpleHTTPClass>InetWrapper.HTTP</SimpleHTTPClass>

We also need to tell the class the HTTP timeout value:

<InetConfig>
  <RequestTimeOut>300</RequestTimeOut>
</InetConfig>

For the tidy wrapper class, we indicate some additional details (as described previously):

<BatchFileName>TIDYRUN</BatchFileName>
<InputFileName>input.html</InputFileName>

My code disables validation when loading the cleaned up HTML, because I was seeing problems resolving the DTD. However, this meant that various entity references that are common in HTML (such as &nbsp;) would cause much grief for the parser. I decided to do some additional processing on the HTML by replacing these entities with character references. The search-and-replace strings are stored in the child elements of the Replacement element. CDATA tags are used to keep the parser from trying to resolve the contents:

  <Replacements>
    <Replacement>
      <Find><![CDATA[ &nbsp; ]]></Find>
      <Replace><![CDATA[ &#160; ]]></Replace>
    </Replacement>
    <Replacement>
      <Find><![CDATA[ &copy; ]]></Find>
      <Replace><![CDATA[ &#169; ]]></Replace>
    </Replacement>
    </Replacements>
</WMLConvertConfig>

So, Configuration looks like this:

'**************************************************************
' Public Function Configure(sXML As String) As Boolean
' Receives an XML for configuring objects used by this class.
'**************************************************************

Public Function Configure(ByVal sXML As String) As Boolean
  Dim oEL As IXMLDOMElement
  Dim sTidyClass As String
  Dim sSimpleHttpClass As String
Local variables are declared for the object class Ids, and the configuration XML is given to the XML parser. If the XML is loaded, the code tries to pull out the names of the objects to use:

If g_oXmlDom.loadXML(sXML) Then
  If (g_oXmlDom.getElementsByTagName("TidyClass").length > 0) Then
    Set oEL = g_oXmlDom.getElementsByTagName("TidyClass").Item(0)
    sTidyClass = oEL.Text

If the class ID for the tidy wrapper is found, then the code tries to create an instance of it, and then configure it by handing off the configuration XML:

      Set g_oTidy = CreateObject(sTidyClass)
        If Not g_oTidy Is Nothing Then
          Configure = g_oTidy.Configure(sXML)
          m_sXmlConfig = sXML
        Else
          Err.Raise -1, "Html2Wml.Conversion.Configure", _
            "Failed to create " & sTidyClass
          Configure = False
          Exit Function
        End If
      Else
        Err.Raise -1, "Html2Wml.Conversion.Configure", _
          "Failed to find TidyClass element"
        Configure = False
        Exit Function
      End If

Likewise, once the Inet wrapper class is found, it too is created and configured:

    If (g_oXmlDom.getElementsByTagName("SimpleHTTPClass")
.length > 0) Then
      Set oEL = g_oXmlDom.getElementsByTagName("SimpleHTTPClass").Item(0)
      sSimpleHttpClass = oEL.Text
      Set g_oHTTP = CreateObject(sSimpleHttpClass)
        If Not g_oHTTP Is Nothing Then
          If Not g_oHTTP.Configure(sXML) Then
          Err.Raise -1, "Html2Wml.Conversion.Configure", _
            "Failed to config g_oHTTP"
          Configure = False
      Exit Function
        Else
          Configure = True 
        End If
      Else
        Err.Raise -1, "Html2Wml.Conversion.Configure", _
          "Failed to create " & sSimpleHttpClass
        Configure = False
        Exit Function
      End If

Otherwise, an error is raised:

    Else
      Err.Raise -1, "Html2Wml.Conversion.Configure", _
        "Failed to find SimpleHTTPClass element."
      Configure = False
      Exit Function
    End If

If the XL cannot be loaded, then an error is raised:

  Else
    Err.Raise -1, "Html2Wml.Conversion.Configure", _
      "Failed to parse config XML: " & _
    g_oXmlDom.parseError.reason
    Configure = False
  End If

End Function

To handle the retrieval and conversion of the requested URL, FetchAndConvert takes the URL and uses the other objects to fetch it and convert it.

'**************************************************************
' Public Function FetchAndConvert(ByVal sURI As String, _
' ByVal sXSL As String) As String
' Retrives an HTML file using the given URI and converts it
' to WML, returning the WML string
'**************************************************************

Public Function FetchAndConvert(ByVal sURI As String, _
  ByVal sXSL As String) As String

  On Error GoTo ErrHand

  Dim sHTML As String
  Dim sXML As String
  Dim sWML As String

The code calls into the Inet wrapper class to get the HTML, then passes it on the tidy wrapper:

  sHTML = g_oHTTP.OpenURL(sURI)
  sXML = g_oTidy.HtmlToXML(sHTML)

The code then prepares to transform the HTML:

  ' now transform ...
  g_oXmlDom.validateOnParse = False
  g_oXmlDom.resolveExternals = False

Potentially bothersome characters are replaced by calling ReplaceStrings, a private method of the
class:

  sXML = ReplaceStrings(sXML, m_sXmlConfig)

  ' The error info should perhaps be WML, unless the calling app
  ' will handle that part
  If Not g_oXmlDom.loadXML(sXML) Then
    FetchAndConvert = "<ERROR location='FetchAndConvert' type='XML parse'>" & _
      "<Reason><domParseError>" & g_oXmlDom.parseError.reason & _
      "<domParseError>" & "<filePos>" & _
      g_oXmlDom.parseError.filepos & _
      "</filePos></Reason><Source>" & sXML & "</Source></ERROR>"
    Exit Function
  End If

If the XML cannot be loaded into the parser, then error information is returned. Otherwise, the XSL is loaded:

  If Not g_oXslDom.loadXML(sXSL) Then
    FetchAndConvert = "<ERROR location='FetchAndConvert' " & _
      " type='XSL parse'>"<Reason>" & _
g_oXslDom.parseError.reason & _
      "</Reason><Source>" & sXSL & "</Source></ERROR>"
    Exit Function
  End If

If the XSL was correctly loaded, it is used to transform the HTML into WML:

  sWML = g_oXmlDom.transformNode(g_oXslDom)

ErrHand:
  If Err.Number <> 0 Then
    sWML = "<ERROR type='COM'><Reason>" & Err.Description & _
      "</Reason></ERROR>"
  End If

If any COM errors occurred, the code returns the error information. Otherwise, it returns the converted web page.

  FetchAndConvert = sWML

End Function

Replacing text in the HTML is a matter of pulling the search and replace text from the configuration XML and running through the document using the VB Replace function:

'**************************************************************
' Private Function ReplaceStrings(sHTML As String) As String
' Goes through the converted document and replaces text, 
' defined in the config XML
'**************************************************************

Private Function ReplaceStrings(sHTML As String, sXMLConfig) As String

  Dim oNodelist As IXMLDOMNodeList
  Dim oEL As IXMLDOMElement
  Dim nIdx As Integer
  Dim sFind As String
  Dim sReplace As String

  If g_oXmlDom.loadXML(sXMLConfig) Then
    Set oNodelist = g_oXmlDom.getElementsByTagName("Replacement")
    If oNodelist.length > 0 Then
      For nIdx = 0 To oNodelist.length - 1
        Set oEL = oNodelist.Item(nIdx)
        sFind = Trim(oEL.getElementsByTagName("Find").Item(0).childNodes(0).Text)
        sReplace = Trim(oEL.getElementsByTagName("Replace").Item(0).childNodes(0).Text)
        sHTML = Replace(sHTML, sFind, sReplace)
      Next
    End If
  End If

  ReplaceStrings = sHTML
End Function

ObjectControl events are used to prepare some objects and to clean up:

Private Sub ObjectControl_Activate()
  Set g_oXmlDom = New DOMDocument30
  Set g_oXslDom = New DOMDocument30
End Sub

Private Function ObjectControl_CanBePooled() As Boolean
  ObjectControl_CanBePooled = False
End Function

Private Sub ObjectControl_Deactivate()
  Set g_oHTTP = Nothing
  Set g_oTidy = Nothing
  Set g_oXmlDom = Nothing
  Set g_oXslDom = Nothing
End Sub

The XSLT Part

Now, if we revisit fetch.asp, we see that it creates an instance of Html2Wml.Conversion, configures it with the XML configuration data, and calls FetchAndConvert with the requested URL and the HTML-to-WML XSL. The XSL (or, more precisely, the XSLT) is admittedly sparse:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml" />
  <xsl:template match="*|/"><xsl:apply-templates/></xsl:template>
  <xsl:template match="text()|@*"><xsl:value-of select="."/></xsl:template>
  <xsl:template match="text()"><xsl:value-of select="."/></xsl:template>
  <xsl:template match="*|/"><xsl:apply-templates/></xsl:template>

  <xsl:template match="html">
    <xsl:text disable-output-escaping="yes">
      &lt;DOCTYPE wml PUBLIC '-//WAPFORUM//DTD WML 1.2//EN'
     'http://www.wapforum.org/DTD/wml_1.2.xml' />
   </xsl:text>

    <wml>
      <xsl:element name="card">
        <xsl:attribute name="id">Results</xsl:attribute>
        <xsl:attribute name="title"><xsl:value-of select="head/title"/></xsl:attribute>
        <xsl:apply-templates/>
      </xsl:element>
    </wml>
  </xsl:template>

  <xsl:template match="head"/>

  <xsl:template match="body"><p>
    <xsl:apply-templates/></p>
  </xsl:template>
  <xsl:template match="h1">
    <h1>
      <xsl:apply-templates/>
    </h1>
  </xsl:template>

<xsl:template match="p">
  <br>
    <xsl:apply-templates/>
    </br>
  </xsl:template>
</xsl:stylesheet>

It by no means manages to convert all HTML into proper WML; it does manage to create wml and card elements, as well as sticking in the occasional p element. Further, it produces the correct DOCTYPE for a WML XML document. However, since the XSL is loaded form a file on the web server, changing it to something more complete is not an issue, and would be interested in seeing a better implementation.

Summary

We’ve seen an IIS/ASP/VB implementation of a site that allows a WAP device user to request any HTML page on the Internet and have it sent back as WML. There are a number of problems with the specific code: The object used to fetch web pages from the server may not work under all circumstance; the clean-up of HTML is done using a single-threaded command-line executable; the XSLT for transforming the HTML to WML is incomplete. However, these problems are isolated in dynamically loaded objects, so further enhancement can go on with any disruption to the main code. I would suggest that, if anybody finds this code useful, they take the time to create a more robust version making heavy use of it. This would include reconsidering the interface classes and the methods they define. Expanding the code is much easier if it permits polymorphism; off hand, I can think of a few more methods that the ISimpleHTTP.IRequest interface could define, such as a method for setting POST data, or methods for adding additional headers.

In any event, I am always interested comments, and can be reached at
james@logicmilestone.com.

References

Clean up your Web pages with HTML TIDY:

PRB: Loading Remote XML or Sending XML HTTP Requests from Server Is Not Supported  
(Microsoft Knowledge Base article):

HOWTO: Use Internet Transfer Control in ASP or in WSH Script
(Microsoft Knowledge Base article): 

HOWTO: 32-Bit App Can Determine When a Shelled Process Ends  
(Microsoft Knowledge Base article):

The Independent WAP/WML FAQ

Developing a Visual Basic Component for IIS/MTS 
(MSDN Web Workshop article): 

How Visual Basic Provides Polymorphism  
(MSDN Library):

Creating and Implementing an Interface  
(MSDN Library): 

Creating Interfaces for Use With the Implements Statement  
(MSDN Library):

And, of course,

Visual Basic 6 XML, James Britt & Teun Duynstee, Wrox Press Ltd


Rate this article on a scale of 1 to 10

Your vote :  


 

Recent Jobs

Software Developers Needed in Charl
Sr. Software Engineer - Analytics
Immediate Mainframe openings for Ch
Immediate TANDEM-TAL openings for C
Immediate ASP.NET/C# Openings for C

View all Jobs (Add yours)
View all CV (Add yours)



help desk services
swimming pool contractor
help desk
water softener
Teleconference
Host Department NOLIMIT Web Hosting
MSN
sunglasses


    Email TopXML  

Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP