BizTalk Utilities CV ,   Jobs ,   Code library  
 
 
Page 2 of 2

 

Previous Page 

Converting HTML to WML

5. Implementation

The following section will give a break down on the languages, problems, changes and methods behind the  implementation, which was two parts. We shall first discuss the detail and implementation of HTML to XHTML transformation and then the XHTML to WML translation. 

Specification

Languages:

XSLT v1.0

BATCH commands

Programs:

M3Gate WAP Emulator v.0.5

Instant SAXON v.5.5.1

Tidy v.4.8.00

PFE v.1.0.1

Hardware:

No special requirements

5.1 HTML to XHTML

 

W3C's Tidy was implemented as the first step in processing input.  Unique settings given below were piped as input to the program.  A few are worth discussing as they addressed some important design/input problems.

1. tidy-mark: no

2. wrap: 99

3. output-xml: yes

4. output-xhtml: yes

5. doctype: omit

6. char-encoding: utf8

7. numeric-entities: yes

8. quote-marks: yes

9. quote-nbsp: yes

10. quote-ampersand: yes

11. logical-emphasis: yes

12. enclose-text: yes

13. alt-text: empty

14. write-back: yes

15. quiet: yes

Line 9 wrote any occurrences of  &nbsp in input as the Unicode&#160equivalent.  This was essential because the SAXON processor was unable to recognize certain special characters as input and would crash during operation. Although&nbspis essentially an empty paragraph, it frequently occurs in HTML generated by packages such as Dreamweaver,  MS FrontPage or conversion of MS Word documents.  This radically increased the proportion of documents we could accept and solving the problem was done through the collaboration with other XSL gurus in the XSL discussion forum. Because now the XSLT had to include a template for handling occurrences of 'empty' paragraphs such as;

<p>&#160</p>

which cannot be displayed by a WAP emulator, and also accommodate that they may occur as entities in other tags such as<div>.  An extension of this usingquote-ampersandmade sure any ampersand characters in the document were also streamlined to the readable &ampequivalent.

Declaring the empty paragraph as an entity at the head of the XML/WML document and writing this to the document as it was processed also developed another solution.  However this implementation was not used because Tidy and XSLT could be used to remove such strings (which were not displayed (entity declaration or not) by an emulator), which was in keeping with the original design of having XSLT doing all the processing rather than start writing declarations and commands directly into the output.

logical-emphasiswas a time saving flag that would convert similar HTML tags to a set one.  For example it would convert all<i>and<em> tags to<em>which meant fewer templates needed writing and all the benefits that come with this.

HTMLs poor conformity meant that text could be written and displayed regardless of whether it was enclosed in<p> or<font>tags.  However WML required such text to be nested in<p>tags, and theenclose-textflag meant this was forced in the production of XHTML, something not automatically checked by the boolean output commands.

Tidy was set to replace the original HTML file with the XHTML version.  This was then passed to the Saxon processor along with the XSLT

 

5.2 XHTML to WML

In place of pages, WML supports hyperlinks to  other cards or decks. The linking and displaying of hypertext references, which is such an integral part of HTML, represented a major  challenge for implementation.  Having introduced this notion in the previous chapter, we will now discuss its implementation using the example below:

<p>
    <strong>
      <a href="http://www.fo.it">
        Fo's Link
      </a>
    </strong>
  </p>

The first challenge was to identify an<a>tag and then to strip out the relevant URL and pass this into WML

However this was complicated as the example above shows by the possibility of nesting.<a>tags can only be displayed in WML with parent::p ie. inside one level of<p>tags.  This makes WMLs document tree structure much more flat.  HTML can have these as an unnested root node that is not even in <html>tags or as a tertiary tree layer as shown above, or further more as a node at a potentially infinite depth. 

This meant that the location of<a>nodes in our XHTML tree had to be determined so that the appropriate style could enable them to be displayed in WML.

The <a>tag also represents usually one of the deepest possible levels in the tree and often has only one child,  the<img>tag that can be instantiated only when<a>is its parent.

To avoid internesting of<p>tags the <a>template checks to determine if a parent node would already have taken us down a branch that had applied a<p> tag.

Having checked the nesting structure, a variable is created to hold the URL value, which is stripped out of the document and displayed as follows:


      <xsl:for-each select=".">
        <a href="#card2">
          <xsl:value-of select="@href"/>
          <xsl:apply-templates/>
        </a>
        <xsl:variable name="chosenURL">
          <xsl:value-of select="@href"/>
        </xsl:variable>
      </xsl:for-each>< BR>

This is repeated for every occurrence of an<a> node.

When a link is selected in the WML document, the user is sent to a second card where they have the option of proceeding to that URL. Ultimately this would then be passed back to the program as the next HTML file.

In practice this proved to be more difficult because of XSLT's limitations in handling variables. The URL variable had to be taken outside of WML so it could be passed around again as an input string.

After some research and experimenting, it was clear that WAP's lightweight WML Script (based on ECMA Script) was not capable of doing this and in order to give the software more usability an implementation over an Apache or IIS gateway was needed.  This would allow the entire program to be encapsulated into a decent server language such as ASP.  More information on this and a discussion of such an implementation can be found in chapters to come.

Converting the XHTML to WML also involved minor changes to other templates'. The testing of these and the detail of which is documented in appendix5.1.

The result of implementation was files:

Tidy.exe, config.txt, Saxon.exe, html2wml.xsl and exec.bat, a batch file that  encapsulated all the operations necessary for document processing.

The software has been setup to run from floppy disk and can be found attached at the back of this report.

The evaluation of this implementation was next. This was done on a desktop PC using a selection of emulators and software, all explained in the following chapter.

6. Evaluation

  Testing was a two stage procedure firstly of development and then evaluation.

In both cases the testing algorithm was:

Save an HTML file as file.htm;

Run Exec.bat;

if (parsing_Error)

 debug file.htm && config.txt;

 goto Run;

 if(unsolvable) record why;

 end test;

 else;

 Open result.wml with browser;

 if(invalid_WML)

  debug code;

  modify program;

  goto Run;

  if(unsolvable) record why;

 end if;

end test;

This section will be begin by looking at the beta testing that was done prior to user evaluation. The second half will then walk through an example of the evaluation process then present the discussion of this before concluding comments on its successes and failings.

6.1  Beta Testing

Preliminary software development was done by constructing *.htm files of increasing complexity, requiring more templates to be gradually added whilst building these into deeper tree structures and then analysing how these had been handled.  By developing increasingly convoluted input, restrictions of what could be interpreted as syntactically correct output meant modifications to the original design and unexpected challenges for the program.

We shall now discuss some instances of this.

<a>tags with parent  <p>get placed in the output straight after the<p>tag no matter where they are in the paragraph, and the hyperlinks are instantiated twice in succession in the output.   This is caused by the bad design of the<p>template shown below:

xsl:stylesheet version="1.0">
    <xsl:template match="p">
      <xsl:for-each select="a">
        <anchor title="Hyperlink">
          <xsl:value-of select="@href"/>
          <go href="#card2('hyperlink')">
            <setvar name="hyperlink" value="card2"/>
          </go>
        </anchor>
        <xsl:text/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:template>
  </xsl:stylesheet>

This pattern can be represented by following the arrows shown overleaf in an extract from the depth-first left tree search constructed as the processor runs through the above XSLT.

The problem was solved by extracting the<a>check node and creating an<a>template that would be called during a template check.  It could therefore appear as a branch from any template check node.

Storing of <a>'s href was done inside the<a>template using a variable (hyperlink) which was written to the WML file with the idea that this could then be passed round as input to the program should the user wish to jump to a hyperlink.

 


 
Figure 6.1 Abstract syntax tree representation of an   extract   from   the   original node template.

Unknown during initial development was the problem of handling the input string <p>&nbsp</p>.  It was later realised this could be parsed by Tidy into<p>&#160</p>, but which  then needed a separate template from the normal<p>since the Unitext code was unrecognised by the WML browser.

The solution was a template that checked a specific occurrence of content,  "p[.='&#160;']"

The theory behind each template was that it would check whether or not it was nested in a<p>tag, since virtually all WML must run with a parent<p>.

This was done using a conditional check on entry to the template.  If the current node's ancestry meant it was in a sub branch of a template, such as the<a>check node shown above, then no formatting was applied,  and whilst other templates were checked for, parsed character data was output.

Processing tags with multiple attributes made effective pattern matching difficult at times, the result would be a; 'required character (found "m") (expected "=")', type error, caused in the<frame…>tag.  The solution was found via the XSLT mailing list.

Example of an illegal WML branching construct

Figure6.2

WML would also not allow certain branching structures.  Problems happened when <strong>or<em>appeared inside  <a>tags.  This is illustrated with figure6.2, representing an illegal WML branch construct.

Checks were added to these templates that skipped generating a style (such as<em>when in an<a>.

Further detail of the development and changes made including some of the  test I/O files are shown in appendix6.1.

6.2 Software Evaluation

Evaluation was done using as wide a variety of HTML documents as possible.  These were gleamed from live websites and saved to disk.  The process of selection was a mixture of random retrievals through  www.random.com/all/ and a selection of sites made by following hyperlinks.

The aim of testing was to evaluate the software against as many different structures and tagsets as possible regardless of how common or impractical these may be. Note that similar looking files were avoided as much as possible.

The browsers used for retrieving and saving the input data was alternated between Netscape 4.7, IE5.0 and Opera 3.60.  However theoretically this would not affect the software's operation.

There were four different environments for output evaluation (checking that code was valid and well-formed syntactically correct).  The first was the manual method using a File Editor where the developer namely myself read through the code to make sure it was readable and presentable, for example was line wrapped etc.

The second type of environment was using three WML browser emulators to show output was being displayed correctly.  The three platforms represent a comprehensive selection of different environments specially selected for this evaluation:

-        M3gate -Numeric Algorithm Labs' claim it's the most advanced  emulator.

-        Nokia -The most popular SDK.

-        OpenWave -An opensource WAP developers' project.

More details on each emulator are described in the Implementation.  Over fifty pages were chosen and processed, the idea that the more testing done the more accurate the evaluation will be.  The URL's for the evaluation are recorded in appendix6.2.

Evaluation was done against files that WAP developer's see as being most likely read with using PDA.  These include stock quotes, employee ré³µmé³ and news headlines.  This ensured that if the software was implemented on a server then it could comfortably handle at least the most common WML requests.  

Although fifty documents represent a very small fraction of available HTML files, their basic structure is represented by the cross section of pages evaluated, which encompassed as wide as possible mix of HTML branching constructs.  Coupled with this, the time involved in running test data and checking every output in each browser was considerable and felt sufficient for accurate evaluation.

A successful transformation is defined as valid well formed WML that displayed in all three browsers maintaining a reasonable level of formatting.

We will now demonstrate the working software, firstly from a users' point of view, then offer a lower-level technical insight.

 

Figure6.3 A randomly selected HTML document as seen through a desktop web browser. Note the content of the page, which is what we aim to repeat through WML.

We will now see the same information displayed using my software on both an M3gate and Nokia emulator.

                    

      Figure 6.4                                                                Figure 6.5
      The document displayed after translation.                Scrolling down the rest of the document.

Here is the same first page seen with Nokia's emulator:

Figure 6.6 Displaying the result on different emulators.

The next two screenshots are the code from this example, starting with our raw HTML input.

<!-- saved from url=(0036)http://www.theupperdeck.com/petpage/ -->
  <html>
    <head>
      <title>
        Dan's Pet Page
      </title>
      <meta content="text/html; charset=windows-1252"/>
      <meta content="High quality photos of exotic birds." name="DESC"/>
    </head>
    <body background="page_files/pbground.gif" vlink="#0080ff">
      <table border="0" cellpadding="6" width="600">
        <tbody>
          <tr>
            <td width="135">
              <font face="ARIAL" size="2">
                <font size="+1">
                  Important
                </font>
              </font>
              <p>
                <font face="ARIAL" size="2">
                  Your feedback is welcomed. Please fill out the
                  <br/>
                  <a href="http://www.dk.com/visit.html">
                    comment form
                  </a>
                  .
                </font>
              </p>
              <p>
                <font face="ARIAL" size="2">
                  Please
                  <a href="mailto:dan@theupperdeck.com">
                    email
                  </a>
                  me if you would like to add a reciprocal link.
                </font>
              </p>
            </td>
            <td width="465">
              <font color="#000080" face="ARIAL" size="2"/>
              <blockquote>
                <font color="#000080" face="ARIAL" size="2">
                  What is man without the beasts?
                  <br/>
                  If all the beasts were gone,
                  <br/>
                  man would die from a great loneliness of spirit.
                  <br/>
                  For whatever happens to the beasts,
                  <br/>
                  soon happens to man.
                  <br/>
                  All things are connected.
                  <br/>
                  <br/>
                  -- Seattle, Chief of the Duwamish, Suquamish and allied Indian tribes
                </font>
              </blockquote>
            </td>
          </tr>
          <tr>
            <td align="left" valign="top" width="135">
              <a href="http://www.amazon.com/exec/obidos/redirect-home/theupperdeck">
                <font face="ARIAL" size="2">
                  <img alt="In Association with Amazon.com" border="0" height="70" src="page_files/readmore1.gif" width="100"/>
                </font>
              </a>
            </td>
            <td valign="top" width="465">
              <font color="#000080" face="TIMES" size="3"/>
              <font color="#000080" face="TIMES" size="3"/>
              <p>
                <font color="#000080" face="TIMES" size="3">
                  <strong>
                    Who would you like to visit? Pick a pet from the list below.
                  </strong>
                </font>
              </p>
            </td>
          </tr>
        </tbody>
      </table>
    </body>
  </html>
Figure6.7

HTML source code

Note the bold text (highlighted for clarity), Figure6.8 following, shows the size of the code and its complexity such result of passing this code through as the attributes and what I call junk my program scattered around it.

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE wml PUBLIC '-//WAPFORUM//DTD WML 1.2//EN' 'http://www.wapforum.org/DTD/wml_1.2.xml' > <wml><card id="Results" title="Dan's Pet Page"> <p>
Important
Your feedback is welcomed. Please fill out the
<a href="#card2">comment form</a><!--http://www.theupperdeck.com/petpage/visit.html-->.
Please <a href="#card2">email</a><!--mailto:dan@theupperdeck.com--> me if you
would like to add a reciprocal link.
</p><p>
What is man without the beasts?
</p><p>
If all the beasts were gone,
</p><p>
man would die from a great loneliness of spirit.
</p><p>
For whatever happens to the beasts,
</p><p>
soon happens to man.
</p><p>
All things are connected.
</p><p>
-- Seattle, Chief of the Duwamish, Suquamish and allied Indian tribes
<a href="#card2">
</a><!--http://www.amazon.com/exec/obidos/redirect-home/theupperdeck-->
<b>Who would you like to visit? Pick a pet
from the list below.</b>

      

Figure 6.8

Resulting WML (highlighted for clarity).

The first thing we notice is how the junk HTML at the top of the input file has been stripped out allowing much smaller file size, essential for wireless file transfers.

As a point of interest, the visual differences between the two formats are quite marked.  The colour, much formatting and images have all been removed, however notice how the body of the document has been preserved and the hyperlinks referenced in WML.

Note also how the branching ancestry of an <a> node has been cut down during the translation, exactly how it should be when conforming to the WML specification.

Although the hardware limitations of PDAs such as their small processing ability, memory, and compact interface force WML to be used, the software presented makes maximum use of it's environment to display as much permissible data as possible.

Essentially all the text information contained in the document has been preserved and so we say this test was successful. More examples are discussed in section 6 of the appendix.

We now turn to look at an example where the software failed so that we can make a constructive conclusion on the scope of our product.

The code below shows how problems came when unexpected tags were encountered. Although there was a template to deal with the <td> tag on line 3 of figure6.9, there was no mechanism for when it took the attributes shown below.

<tr> <td width="883"> <p align="center"> <small> <font face="Arial"> First Consultants UK Ltd </font></small></p> </td> </tr> <… Figure6.9

HTML input

The result was confusion when tags like this were parents to others.  Frequently this meant instantiating a <p> node with parent::p in our abstract syntax tree i.e.:

...> </p> <p> <p>First Consultants UK Ltd</p> <...

Figure6.10

WML result

This did not constitute valid WML.

The majority of failings came when the parsed character data consisted of JavaScript operators and constructs. There was also a bug when processing tag attributes that too generated a<p> tag with child<p>.

Modifying the template design to accommodate more types of ambiguous input would have involved a significant amount of coding and had to be left as a bug in the software.

Having completed testing, the results enabled me to make conclusions on how the software could be improved which we will now discuss. 

Improvements that I would like to see done for future releases would include a better way of handling tables.  WML does support tabular formatting albeit very constricted given its hardware interface.  It would be interesting to see to what extent this could be used to show HTML tables.

Expanding the XSLT 'dictionary' would also be an easy modification enabling it to cope with more tags than present.  This would improve the scope of the software.  Taking this other step, research could be done into converting graphics into wireless bitmaps.  Some good conversion programs are already available. Gingco's implementation I know can be supported over Java Servlets running on Apache. Anyway a good place to start for anyone interested is the Collaborative Computational Project.

Ideas for more major pieces of work, possibly a v1.0 release are discussed in the final chapter.

I now present the results of the evaluation.

Text
 Box: TESTDATA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 I X X X X X II X X X III X X
Figure 6.11 Test data evaluation results, with failed transformations shown as an 'X' mark.

This can be represented graphically:

 


 
Figure 6.12 Graphical summary of the test data evaluation results.

As the results show, there was a successful transformation of approximately three-quarters of all data.  The reasons for some of the failed transformations were quite unexpected and interesting, since it showed limitations and demonstrated the scope of the software, which helps to draw the conclusions we will now discuss.

7. Conclusion and Discussion

It is evident we have seen a successful solution to the problem of static HTML to WML translation.

By producing valid well-formed WML and with a success rate of at least 70%,  the software has also fulfilled its objectives in handling many of the complex constructs found in HTML.  Most importantly these have included routines for handling hyperlinks and presenting framed pages. The implementation of this was only  through exhaustive efforts made in research and testing.

XDNL

In discussion it is worth mentioning another similar XML based language that offers a hybrid of XSLT and XPath type controls.  The problem with XSLT is it has no regard for navigation or interaction. In this project we have concentrated on finessing a "look" from HTML to WML for which it worked well.  But if you are delivering output to a family of devices with fundamentally differing form such as a WAP handset and a PC Web Browser, navigation is an issue you might like to consider.

XDNL, The XML Document Navigation Language proposes using XML to encode the basic content and an XDNL document to describe the navigation for different devices.  The principle is to write a navigation description for each family of target devices, which is used at runtime on a server to only serve the appropriate section of the original content document.  Thus certain XHTML elements can be selectively excluded from processing altogether.

I believe that we have however come up with a solution that works without the need for XDNL, something that I see as more of a tool for Website managers.  In time I believe WMLs badly written protocol that pretty much breaks every relevant protocol standard will be superseded by developments in hardware and networks such as UMTS,  but for those interested please read W3C's XDNL specification. There is also currently discussion on the migration from WML to XHTML or CHTML, a low bandwidth HTML hybrid. This is accompanied by the introduction of HDML (Handheld Device Markup Language), the discussion of which I have put in appendix7.1.

7.1 Conclusion

Although our implementation was not successful at converting all HTML documents, the difficulties overcome and the modifications made to the original design have produced a solid base solution to the problem.

In fact most of the remaining bugs in the software are caused not so much by shortcomings in design but flaws in previous code integrated with this project.

Undoubtedly the XHTML dictionary could be expanded to harbour more templates, and the modular design of this software coupled with the well commented accompanying documentation means this would be easy to implement.

I would also suggest the creation of a conditional check that user input was of the type *.htm  this would prevent invalid filename input or could possibly present the user with an equivalent document type translation package, such the image to wireless bitmap conversion mentioned previously.

Since starting work on this project I have learnt a great deal about the subject and have used this to advance others' knowledge in the area.  People working on similar problems as to how the concepts I have used can be improved or further applied frequently contact me.  Some of these correspondences are shown as  appendices and the result of  discussion with others in the field has culminated in my decision to seek a public licence for the software and make it and this accompanying documentation publicly available.

7.2 Further work

A topic for further work would be to make the software operate dynamically, producing documents on the fly.  As mentioned previous, this would require implementation on a web server, which would allow users much more functionality.

Some work has been done into researching how this would be possible, and we will now briefly discuss a possible implementation that could form the basis for a future piece of work.

By wrapping the software up through a WAP gateway, we could add interrogating and interactive scripts that could return and relay results to users offering much more functionality than present.   This could include a script to strip the WML hyperlinks (currently saved as comments) out and into a second WML card that passed the string back as input to the translation program should the user request the said URL.

Unfortunately as explained WML Script does not have this power but using Active Server Pages over an IIS or Apache server it should be possible to perform such operations.

By using a gateway to handle the code transmission, we can extend the functionality of WML without the browser even knowing what was happening.

I present the following ASP file that could be used to get and pass additional URL requests:

<<%@ Language=VBScript%>

<% Response.ContentType = "text/vnd.wap.wml"%>

<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.2//EN" "http://www.wapforum.org/DTD/wml_1.2.xml">
  <wml>
    <card id="card1" title="HTML 2 WML">
      <do type="accept" label="Submit">
        <go href="get.asp?url=$(url)l"/>
      </do>
      <p>
        URL:
        <input name="url" value=""/>
        <br/>
      </p>
    </card>
  </wml>

Our get.asp file would then take the URL and find the relevant HTML document before processing it and returning its WMLequivalent.

I leave this as a little taster for those interested in finding out more.

It might also be an interesting exercise to evaluate whether a similar implementation might be made for handling other types of document such as PostScript. This could be integrated together with similar software to produce a comprehensive package of Web translation tools.

7.3 Summary

In this report we have presented a unique solution to an original problem using established scientific concepts.  Following discussion with professionals in the field the software has been put under GNU Public Licence and at the time of writing a summary paper is awaiting publication.  For more information or updates please visit http://www.paulhoward.co.uk or see the following appendices.

Appendix

1.1

The Wireless Application Protocol is a specification for wireless data communications using hand-held devices such as mobile phones and palmtop computers. Use of the WAP specification allows mobile devices to communicate with the Internet or an intranet, providing the users of these devices with mobile data communications capabilities such as web browsing and e-mail.

The WAP Forum, an industry association of wireless device manufacturers, service providers, and software companies, developed the WAP specification. Further information about the WAP Forum can be found at its website at www.wapforum.org.

Essentially the protocol is set to fail. Contrary to traditional protocols, WAP is not an engineering construct, more a business one. Open development and freedom of usage is restricted by the strict and high monetary cost of entry to its supposedly 'open' forum.

WAP has claimed center stage not because it fulfils the needs of the industry, but because thus far, no viable alternative has been presented. However alternatives such as LEAP do exist and it is my opinion as already mentioned that improvements in hardware and alike will lead to its demise and an acceptance of better more conventional languages made possible through technological advancement.

1.2

http://www.devguru.com/Technologies/wml/quickref/wml_hierarchy.html

There is an excellent WML tree representation at this URL, I would have liked to include it as an attachment only its size and complicated HTML layout means reinterpreting it here is not possible, but the document is worth looking at to demonstrate WMLs simple nesting hierarchy.

4.1

Opening page source code from figure4.1:

<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.2//EN" "http://www.wapforum.org/DTD/wml_1.2.xml">
  <wml>
    <card id="card1" title="HTML 2 WML">
      <do type="accept" label="Submit">
        <go href="page1.wmls#passURL($(URL))"/>
      </do>
      <p>
        Enter URL:
        <input type="text" name="URL"/>
      </p>
    </card>
  </wml>

4.2

Template pseudo-code:

 

 

 

 



4.4

XSLT representation of the template design, note how it has been structured similar to anticipated input, with the html and head templates towards the top, followed by the headings etc.


6.2

Here are the URLs for all the test data used in the evaluation:

http://www.tesarta.com/

http://sports.nfl.com/2000/newsnotes?team=20

http://music.recycler.com/

http://www.thewebsters.com/

http://www.juggling.org/

http://www.treasurenet.com/

http://www.poets.org/

http://www.theupperdeck.com/petpage/

http://www.edithwharton.org/

http://www.kevdo.com/lipbalm/

http://www.geocities.com/thetropics/cabana/5807/

http://www.cs.nott.ac.uk/

http://www.paulhoward.co.uk/cv.htm

http://www.cs.bris.ac.uk/~mm7323/MainBot/

http://www.w3.org/People/Raggett/tidy/

http://www.google.com

http://www.nothing.com

http://www.random.com/full/

http://news.bbc.co.uk/hi/english/sci/tech/default.stm

http://www.handsoffmy.org/

http://www.icann.org/

http://www.techweb.com/encyclopedia/

http://www.bloomberg.com/bbn/windex.html

http://s1.amazon.com/exec/varzea/subst/fx/help/how-we-know.html/058-5282950-4123910

http://www.dictionary.com/wordoftheday/?from=nothing

http://www.gimp.org/the_gimp_about.html

http://www.dtic.mil/armylink/faq/index.html

http://www.helm.com/document.htm

http://www.fedworld.gov/jobs/jobsearch.html

http://www.adobe.com/products/smcoll/main.html

http://www.hqda.army.mil/ogc/eandf.htm

http://www.fcuk.com/first_consultants_uk_ltd.htm

http://www.boselecta.com/

http://shopping.infospacehosting.net/exitpage/index_hm.html

http://www.redhat.com/apps/commerce/order_history.html

http://www.rinsing.com/

http://www.oreilly.com/catalog/javaxml/

http://www.activestate.com/ASPN/Downloads/ActivePython/More

http://www.drumnbassarena.com/djpics/djrisky.html

http://www.drumnbassarena.com/news/

http://www.azuli.com/

http://www.upsideweb.com/resources/designanddev.htm

http://developer.openwave.com/index.html

 7.1

I refer to 3G Mobile Vol.3, Number 5. March7 2001. A discussion on how Japan's DoCoMo are expecting Compact-HTML (CHTML) to provide the next generation of XML compliant meta languages.  Meanwhile Phone.com are the sole controllers of the HDML specification, which may hold back its development, which promises few features not already available to WAP developers.  In short, by the time either language has had the opportunity for growth seen on the scale of HTML, the hardware devices will not be restricted to using these low bandwidth languages.

7.2

Some of the correspondence I have had regarding publication and licensing of the software and my report:

> -----Original Message-----
> From: Paul Howard <prh98c@cs.nott.ac.uk>

ئnbsp;      Sent: Tuesday, April 17, 2001 8:59 PM
> To: wap-dev@AnywhereYouGo.com
> Subject: Re: [WAP-dev] Converter HTML TO WML
>
> >Hi,
> >
> > I am a computer science student working on a project designed to translate
> > HTML to WML.
> >
> > My implementation is something like this:
> >
> > HTML>>Tidy.exe>>xHTML>>Saxon.exe + XSLT>>WML
> >
> > If you have any specific questions feel free to email me.
> >
> > Paul.
> >
> > ================================
> > School of Computer Science & IT
> > University of Nottingham
> > Nottingham, UK
> >
> > http://www.paulhoward.co.uk
> > ================================
> > _______________________________________________
> > WAP-dev@AnywhereYouGo.com
> > To unsubscribe, or change subscription options:
> > http://mail.AnywhereYouGo.com/mailman/listinfo/wap-dev
> >

---- Original Message -----

From:        nilesh dhande <nilesh_dhande@infy.com>

To:             <prh98c@cs.nott.ac.uk>

Sent:         Wednesday, April 18, 2001 6:07 AM

Subject:    RE: [WAP-dev] Converter HTML TO WML

> Hi Paul,
>

> I am interested in conversion from HTML to XHTML , For this
> conversion Tidy.exe(which is a converter, i suppose) may be using some rule
> base to convert or filter the contents in HTML . Could you please elaborate
> with some example how u r defining the contents.
>
> Regards,
> Nilesh
>

----- Original Message -----

From:        James <james@logicmilestone.com>

To:             Paul Howard <prh98c@cs.nott.ac.uk>

Sent:         Sunday, April 15, 2001 11:26 PM

Subject:    RE: TopXML HTML>WML


> >
> > Hi there,
>
> Hi.
>
> >
> > I hope you dont mind me writing but I have been working on a project to
> > generate WML pages on request from HTML pages.
>
> No, not at all.
>
> >
> > I have developed a comprehensive XSLT sheet which in conjunction with Tidy
> > and Saxon can take a wide variety of HTML pages and through a little batch
> > file DOS command generates the WML version, handling frames, tables, text
> > and many of HTML's other bizarre tags. ie.
> >
> > D:\Project\html2wml>exec.bat foo.htm
> >
> > Generates foo.wml which is all very nice.
>
> That sounds quite impressive. I When I was doing my ASP/HTML/WML
> code I didn't have the wherewithal to create one myself, and
> could not find one out on the web.
>
> >
> >
> > I am really eager to see if I can get this to run on my desktop simply by
> > inputting the html filename into a WML page on my WAP emulator.
> >
> > Id have hoped the WML page would have passed the filename to and executed
> > exec.bat then provided a link to the generated WML page, but this
> > is proving
> > tricky given WMLS restricted library.
>
> I'm not sure I follow this: Give an HTML url to a WML page,
> and the server fetches the HTML, converts to a WML file, and then
> renders a WML page, showing the link to this new WML page on the server?
>
> >
> > I noticed you had done a similar implementation using ASP only as far as I
> > am aware .asp pages aren't recognized by WAP emulators, so when I tested
> > your default.asp code (from the TopXML html>wml site) it refused to support
> > Content of Type text/asp. Is this because I am not using it over a WAP
> > gateway or what?
>
> You can have ASP emit any HTTP header you like. By default it should be
> sending "text/html", but there is an ASP directive to alter this:
> Response.ContentType = "text/wml"; // or whatever you like
>
>
> >
> > I would be really grateful and interested if you could spare a few minutes
> > to tell me how your suggested implementation would work because
> > as far as I
> > am aware, when doing a go href in WML the URL is simply passed back to the
> > browser to handle which it obviously cannot (seems not to) do.
>
> Well, when a user clicks a link, such as http://www.foo.com/SomePage.wml the
> browser makes an HTTP GET request to www.foo.com, and requests file
> SomePage.wml.
> It is up to the server (www.foo.com) to know how to process and return the
> requested
> file. For example, if you request foo.asp from an IIS machine, the web
> server
> knows that any file ending in .asp needs to be pre-processed before the
> HTML is sent back to the browser.
>
> There is as trick you can do with IIS (and any other decent web server), and
> that is to define special file type associations. For example, you can tell
> IIS that any file ending in .wml should be processed by asp.dll (just as it
> does for files ending in .asp). Then, you can write your ASP code, save
> them in files lie foo.wml, and when a request is made, IIS treats it like
> any
> other .asp file.
>
> >
> > Both your implementation and mine as far as I can see rely on the
> > processing
> > being done outside of the browser and I can't see (unless through
> > the use of
> > a gateway <which I know v.little about!>) how this can be possible.
>
> It's possible because requesting a file from a web server is similar to
> calling a function. Requesting "foo.asp" is sort of like calling foo(),
> and what happens inside foo() is really just the server-side script
> in foo.asp. The browser (whether IE5, or some WAP phone) expects HTML or
> WML;
> how that gets created is up to the web server. The web server knows
> what to do by looking at the file requested, and processing it according
> to its file-type association.
>
> The browser knows nothing of how the page is created, and only
> cares that it gets a known content type it can render.
>
> A WAP device can (as far as I know) request any URL. It can call
> somepage.asp, and somepage.asp can go do whatever it needs to.
> Untimely, somepage.asp needs to do two things: emit the correct
> content-type header appropriate for the browser, and emit the
> correct markup. For example, a single ASP could detect the browser type,
> and emit either WML or HTML based on the UserAgent header.
>
> My code attempted to have the server fetch an HTML, convert the HTML to
> WML, and return the WML directly to the browser. It had several flaws,
> most notably the absence of any way to break large HTML pages into
> multiple smaller WML pages.
>
> I would think your process would go something like this:
>
> WAP device submits a form specifying an HTML page to fetch:
> http://someserver.com/getPage.wml?url=ww.someplace.com/somepage.html
>
> getPage.wml is really ASP, and IIS is set up to processes wml files
> as asp files. Inside getPage.wml, the code:
>
> Grabs the URL of the requested HTML file
> Retrieves that file and writes the contents to disk
> Calls the batch script to process the HTML file into WML.
> Generates WML (containing a link to the new WML file) and sends
> it back:
>
> // Get the HTML page and save it to savedHtmlFileName,
> // then processes it ...
>
> var newLink = processHtmlFile(savedHtmlFileName);
> Response.ContentType = "text/wml";
> Response.Write("<?xml version='1.0'?>");
> Response.Write("<!DOCTYPE wml PUBLIC '-//WAPFORUM//DTD WML 1.1//EN ");
> Response.Write("http://www.wapforum.org/DTD/wml_1.1.xml'>");
> Response.Write("<wml>");
> Response.Write("<card id='card1' title='Fetch>");
> Response.Write("<do type='accept' label='Link'>");
> Response.Write(" <go href='" + newLink + "' />");
> Response.Write("</do>");
> Response.Write("</card>");
>
> (Might not be exactly right WML)
>
>
> >
> > In any case if you would be interested in including or looking at my XSLT
> > I'd be really pleased to make it available to anyone who might find it
> > useful. Some of the implementations I have seen out there are
> > really poor!
>
> I would very much like to see the XSLT. If you are willing to make it
> publicly available, I would recommend submitting it to TopXML.com.
> I would also recommend adding appropriate copyrights to the file
> to protect your interests, or to at least keep others from misappropriating
> it.
> (At the least, consider putting it under the GNU public license.)
>
> >
> > Many thanks for your time and I hope to hear from you soon.
>
>
> Hope this helps. I believe there is a VB/WML mailing list
> hosted by the TopXML folks, and they have online forums, too.
>
> >
> > Kind regards,
> >Paul.
> >
> Take care,
> James
>
> >
> > ==========================
> > School of Computer Science & IT
> > University of Nottingham
> > Nottingham, UK
> > ==========================
> >
>

----- Original Message -----

From:        James <james@logicmilestone.com>

To:             Paul Howard <prh98c@cs.nott.ac.uk>

Sent:         Saturday, April 28, 2001 10:38 PM

Subject:    RE: TopXML HTML>WML


> Paul,
>
> I recently received some e-mail from someone who wondered
> if I or anyone else had completed writing the XSLT to transform
> HTML to WML.
>
> I told him that in fact I had just heard from someone who was doing it.
>
> I was asked if I could pass on your e-mail address, or the XSLT.
> Of course, I don't want to gave out any details without talking you
> first, and I don't want to put you on the spot or anything.
>
> You mentioned making the XSLT available; have you decided on how you
> would like to do that?
>
> Thanks,
>
> James
>
>

API
Application Programming Interface a language and message format used by an application program to communicate with the operating system or some other system or control program such as a database management system or communications protocol

Software and Source Code

Get the source code!
  The html2wml.zip contains all the bits needed for static HTML to WML translation, I have included Saxon and Tidy.exe
  The documentation.zip is a 30 page pdf (+20page appendix) covering most of the details of my implementation.

The software is bound under the 2001 Copyright terms of GNU Public Licence. Please report any errors to project@paulhoward.co.uk.

Page 2 of 2

 

Previous Page 
 

Recent Jobs

Software Developers Needed in Charl
Sr. Software Engineer - Analytics
Immediate Mainframe openings for Ch
Immediate TANDEM-TAL openings for C
Immediate ASP.NET/C# Openings for C

View all Jobs (Add yours)
View all CV (Add yours)



online fax service
swimming pool contractor
teleconferencing services
water softener
Teleconference
Host Department NOLIMIT Web Hosting
MSN
sunglasses


    Email TopXML  

Front Page Daily Stuff TopXML Forum XML blogs XML Newsgroups BizTalk Biztalk Utilities Biztalk Utilities Tutorial B2B SAP XML Microsoft .NET Dotnet System XML Soapformatter SQLXML XMLserializer XQuery PHP PHP SimpleXML PHP XML Dom PHP XML RPC PHP XSLT Java Java Java XML Xalan Microsoft ASP ASP Schemas XML SQL Server XML XMLDom XSL XSL Tutorial XSLT Stylesheets General Javascript CSS XHTML WAP