|
Page 2 of 2
|
|
Converting HTML to WML
The following section will give a break down on the languages,
problems, changes and methods behind the implementation,
which was two parts. We shall first discuss the detail and
implementation of HTML to XHTML transformation and then the XHTML
to WML translation.
Specification
Languages:
XSLT v1.0
BATCH commands
Programs:
M3Gate WAP Emulator v.0.5
Instant SAXON v.5.5.1
Tidy v.4.8.00
PFE v.1.0.1
Hardware:
No special requirements
5.1 HTML to XHTML
W3C's Tidy was implemented as the first step in processing
input. Unique settings given below were piped as input to the
program. A few are worth discussing as they addressed some
important design/input problems.
1. tidy-mark: no
2. wrap: 99
3. output-xml: yes
4. output-xhtml: yes
5. doctype: omit
6. char-encoding: utf8
7. numeric-entities: yes
8. quote-marks: yes
9. quote-nbsp: yes
10. quote-ampersand: yes
11. logical-emphasis: yes
12. enclose-text: yes
13. alt-text: empty
14. write-back: yes
15. quiet: yes
Line 9 wrote any occurrences of   in input as the
Unicode equivalent. This was essential because the
SAXON processor was unable to recognize certain special characters
as input and would crash during operation. Although is
essentially an empty paragraph, it frequently occurs in HTML
generated by packages such as Dreamweaver, MS FrontPage or
conversion of MS Word documents. This radically increased the
proportion of documents we could accept and solving the problem was
done through the collaboration with other XSL gurus in the XSL discussion
forum. Because now the XSLT had to include a template for
handling occurrences of 'empty' paragraphs such as;
<p> </p>
which cannot be displayed by a WAP emulator, and also
accommodate that they may occur as entities in other tags such
as<div>. An extension of this usingquote-ampersandmade
sure any ampersand characters in the document were also streamlined
to the readable &equivalent.
Declaring the empty paragraph as an entity at the head of the
XML/WML document and writing this to the document as it was
processed also developed another solution. However this
implementation was not used because Tidy and XSLT could be used to
remove such strings (which were not displayed (entity declaration
or not) by an emulator), which was in keeping with the original
design of having XSLT doing all the processing rather than start
writing declarations and commands directly into the output.
logical-emphasiswas a time saving flag that would convert
similar HTML tags to a set one. For example it would convert
all<i>and<em> tags to<em>which meant fewer
templates needed writing and all the benefits that come with
this.
HTMLs poor conformity meant that text could be written and
displayed regardless of whether it was enclosed in<p>
or<font>tags. However WML required such text to be
nested in<p>tags, and theenclose-textflag meant this was
forced in the production of XHTML, something not automatically
checked by the boolean output commands.
Tidy was set to replace the original HTML file with the XHTML
version. This was then passed to the Saxon processor along
with the XSLT
5.2 XHTML to WML
In place of pages, WML supports hyperlinks to other cards
or decks. The linking and displaying of hypertext references, which
is such an integral part of HTML, represented a major challenge for
implementation. Having introduced this notion in the previous
chapter, we will now discuss its implementation using the example
below:
<p>
<strong>
<a href="http://www.fo.it">
Fo's Link
</a>
</strong>
</p>
The first challenge was to identify an<a>tag and then to
strip out the relevant URL and pass this into WML
However this was complicated as the example above shows by the
possibility of nesting.<a>tags can only be displayed in WML
with parent::p ie. inside one level of<p>tags. This
makes WMLs document tree structure much more flat. HTML can
have these as an unnested root node that is not even in
<html>tags or as a tertiary tree layer as shown above, or
further more as a node at a potentially infinite depth.
This meant that the location of<a>nodes in our XHTML tree
had to be determined so that the appropriate style could enable
them to be displayed in WML.
The <a>tag also represents usually one of the deepest
possible levels in the tree and often has only one child,
the<img>tag that can be instantiated only when<a>is its
parent.
To avoid internesting of<p>tags the <a>template
checks to determine if a parent node would already have taken us
down a branch that had applied a<p> tag.
Having checked the nesting structure, a variable is created to
hold the URL value, which is stripped out of the document and
displayed as follows:
<xsl:for-each select=".">
<a href="#card2"> <xsl:value-of select="@href"/>
<xsl:apply-templates/> </a>
<xsl:variable name="chosenURL">
<xsl:value-of select="@href"/> </xsl:variable>
</xsl:for-each><
BR>
This is repeated for every occurrence of an<a> node.
When a link is selected in the WML document, the user is sent to
a second card where they have the option of proceeding to that URL.
Ultimately this would then be passed back to the program as the
next HTML file.
In practice this proved to be more difficult because of XSLT's
limitations in handling variables. The URL variable had to be taken
outside of WML so it could be passed around again as an input
string.
After some research and experimenting, it was clear that WAP's
lightweight
WML Script (based on ECMA Script) was not capable of doing this
and in order to give the software more usability an implementation
over an Apache or IIS gateway was needed. This would allow
the entire program to be encapsulated into a decent server language
such as ASP. More information on this and a discussion of
such an implementation can be found in chapters to come.
Converting the XHTML to WML also involved minor changes to other
templates'. The testing of these and the detail of which is
documented in appendix5.1.
The result of implementation was files:
Tidy.exe, config.txt, Saxon.exe, html2wml.xsl and exec.bat, a
batch file that encapsulated all the operations necessary for
document processing.
The software has been setup to run from floppy disk and can be
found attached at the back of this report.
The evaluation of this implementation was next. This was done on
a desktop PC using a selection of
emulators and software, all explained in the following
chapter.
Testing was a two stage procedure firstly of
development and then evaluation.
In both cases the testing algorithm was:
Save an HTML file as file.htm;
Run Exec.bat;
if (parsing_Error)
debug file.htm && config.txt;
goto Run;
if(unsolvable) record why;
end test;
else;
Open result.wml with browser;
if(invalid_WML)
debug code;
modify program;
goto Run;
if(unsolvable) record why;
end if;
end test;
This section will be begin by looking at the beta testing that
was done prior to user evaluation. The second half will then walk
through an example of the evaluation process then present the
discussion of this before concluding comments on its successes and
failings.
6.1 Beta Testing
Preliminary software development was done by constructing *.htm files of increasing complexity, requiring more templates
to be gradually added whilst building these into deeper tree
structures and then analysing how these had been handled. By
developing increasingly convoluted input, restrictions of what
could be interpreted as syntactically correct output meant
modifications to the original design and unexpected challenges for
the program.
We shall now discuss some instances of this.
<a>tags with parent <p>get placed in the
output straight after the<p>tag no matter where they are in
the paragraph, and the hyperlinks are instantiated twice in
succession in the output. This is caused by the bad
design of the<p>template shown below:
xsl:stylesheet version="1.0"> <xsl:template match="p">
<xsl:for-each select="a"> <anchor title="Hyperlink">
<xsl:value-of select="@href"/> <go href="#card2('hyperlink')">
<setvar name="hyperlink" value="card2"/>
</go> </anchor>
<xsl:text/> </xsl:for-each>
<xsl:apply-templates/> </xsl:template> </xsl:stylesheet>
This pattern can be represented by following the
arrows shown overleaf in an extract from the depth-first left tree search
constructed as the processor runs through the above XSLT.
The problem was solved by extracting
the<a>check node and creating an<a>template that would be
called during a template check. It could therefore appear as a
branch from any template check node.
Storing of <a>'s href was done inside
the<a>template using a variable (hyperlink) which was written to the
WML file with the idea that this could then be passed round as input to
the program should the user wish to jump to a hyperlink.
Figure 6.1 Abstract syntax tree representation of
an extract from the
original node template.
Unknown during initial development was the problem
of handling the input string <p> </p>. It was
later realised this could be parsed by Tidy
into<p> </p>, but which then needed a separate
template from the normal<p>since the Unitext code was unrecognised
by the WML browser.
The solution was a template that checked a
specific occurrence of content, "p[.=' ']"
The theory behind each template was that it would
check whether or not it was nested in a<p>tag, since virtually all
WML must run with a parent<p>.
This was done using a conditional check on entry
to the template. If the current node's ancestry meant it was in a
sub branch of a template, such as the<a>check node shown above, then
no formatting was applied, and whilst other templates were checked
for, parsed character data was output. Processing tags with multiple
attributes made effective pattern matching difficult at times, the result
would be a; 'required character (found "m") (expected "=")', type error,
caused in the<frame…>tag. The solution was found via the XSLT mailing list.
Example of an illegal WML branching
construct
Figure6.2
WML would also not allow certain branching
structures. Problems happened when
<strong>or<em>appeared inside <a>tags. This
is illustrated with figure6.2, representing an illegal WML branch
construct.
Checks were added to these templates that skipped
generating a style (such as<em>when in an<a>.
Further detail of the development and changes made
including some of the test I/O files are shown in appendix6.1.
6.2 Software Evaluation
Evaluation was done using as wide a variety of
HTML documents as possible. These were gleamed from live websites
and saved to disk. The process of selection was a mixture of random
retrievals through www.random.com/all/ and a selection of sites made
by following hyperlinks.
The aim of testing was to evaluate the software
against as many different structures and tagsets as possible regardless of
how common or impractical these may be. Note that similar looking files
were avoided as much as possible.
The browsers used for retrieving and saving the
input data was alternated between Netscape 4.7, IE5.0 and Opera
3.60. However theoretically this would not affect the software's
operation.
There were four different environments for output
evaluation (checking that code was valid and well-formed syntactically
correct). The first was the manual method using a File Editor where
the developer namely myself read through the code to make sure it was
readable and presentable, for example was line wrapped etc.
The second type of environment was using three WML
browser emulators to show output was being displayed correctly. The
three platforms represent a comprehensive selection of different
environments specially selected for this evaluation:
- M3gate -Numeric Algorithm Labs' claim it's the most
advanced emulator.
- Nokia -The most popular SDK.
-
OpenWave -An opensource WAP developers'
project.
More details on each emulator are described in the
Implementation. Over fifty pages were chosen and processed, the idea
that the more testing done the more accurate the evaluation will be.
The URL's for the evaluation are recorded in appendix6.2.
Evaluation was done against files that WAP
developer's see as being most likely read with using PDA. These
include stock quotes, employee ré³µmé³ and news headlines. This
ensured that if the software was implemented on a server then it could
comfortably handle at least the most common WML requests.
Although fifty documents represent a very small
fraction of available HTML files, their basic structure is represented by
the cross section of pages evaluated, which encompassed as wide as
possible mix of HTML branching constructs. Coupled with this, the
time involved in running test data and checking every output in each
browser was considerable and felt sufficient for accurate evaluation.
A successful transformation is defined as valid
well formed WML that displayed in all three browsers maintaining a
reasonable level of formatting.
We will now demonstrate the working software,
firstly from a users' point of view, then offer a lower-level technical
insight.

Figure6.3 A randomly
selected HTML document as seen through a desktop web browser. Note the
content of the page, which is what we aim to repeat through
WML.
We will now see the same information displayed
using my software on both an M3gate and Nokia emulator.

Figure 6.4
Figure 6.5 The
document displayed after
translation.
Scrolling down the rest of the document.
Here is the same first page seen with Nokia's
emulator:

Figure 6.6 Displaying the result on
different emulators.
The next two screenshots are the code from this
example, starting with our raw HTML input.
<!-- saved from url=(0036)http://www.theupperdeck.com/petpage/ -->
<html> <head> <title>
Dan's Pet Page </title>
<meta content="text/html; charset=windows-1252"/>
<meta content="High quality photos of exotic birds." name="DESC"/>
</head> <body background="page_files/pbground.gif" vlink="#0080ff">
<table border="0" cellpadding="6" width="600">
<tbody> <tr>
<td width="135">
<font face="ARIAL" size="2">
<font size="+1">
Important </font>
</font> <p>
<font face="ARIAL" size="2">
Your feedback is welcomed. Please fill out the
<br/>
<a href="http://www.dk.com/visit.html">
comment form
</a>
. </font>
</p> <p>
<font face="ARIAL" size="2">
Please
<a href="mailto:dan@theupperdeck.com">
email </a>
me if you would like to add a reciprocal link.
</font> </p>
</td> <td width="465">
<font color="#000080" face="ARIAL" size="2"/>
<blockquote>
<font color="#000080" face="ARIAL" size="2">
What is man without the beasts?
<br/>
If all the beasts were gone,
<br/>
man would die from a great loneliness of spirit.
<br/>
For whatever happens to the beasts,
<br/>
soon happens to man.
<br/>
All things are connected.
<br/>
<br/>
-- Seattle, Chief of the Duwamish, Suquamish and allied Indian tribes
</font> </blockquote>
</td> </tr>
<tr> <td align="left" valign="top" width="135">
<a href="http://www.amazon.com/exec/obidos/redirect-home/theupperdeck">
<font face="ARIAL" size="2">
<img alt="In Association with Amazon.com" border="0" height="70" src="page_files/readmore1.gif" width="100"/>
</font> </a>
</td> <td valign="top" width="465">
<font color="#000080" face="TIMES" size="3"/>
<font color="#000080" face="TIMES" size="3"/>
<p> <font color="#000080" face="TIMES" size="3">
<strong>
Who would you like to visit? Pick a pet from the list below.
</strong>
</font> </p>
</td> </tr>
</tbody> </table> </body>
</html>
Figure6.7
HTML source code
Note the bold text (highlighted for clarity),
Figure6.8 following, shows the size of the code and its complexity such
result of passing this code through as the attributes and what I call junk my program scattered around it. <?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE wml PUBLIC '-//WAPFORUM//DTD WML 1.2//EN'
'http://www.wapforum.org/DTD/wml_1.2.xml' >
<wml><card id="Results" title="Dan's Pet Page">
<p>
Important
Your feedback is welcomed. Please fill out the
<a href="#card2">comment form</a><!--http://www.theupperdeck.com/petpage/visit.html-->.
Please <a href="#card2">email</a><!--mailto:dan@theupperdeck.com--> me if you
would like to add a reciprocal link.
</p><p>
What is man without the beasts?
</p><p>
If all the beasts were gone,
</p><p>
man would die from a great loneliness of spirit.
</p><p>
For whatever happens to the beasts,
</p><p>
soon happens to man.
</p><p>
All things are connected.
</p><p>
-- Seattle, Chief of the Duwamish, Suquamish and allied Indian tribes
<a href="#card2">
</a><!--http://www.amazon.com/exec/obidos/redirect-home/theupperdeck-->
<b>Who would you like to visit? Pick a pet
from the list below.</b>
Figure 6.8
Resulting WML (highlighted for
clarity).
The first thing we notice is how the junk HTML at
the top of the input file has been stripped out allowing much smaller file
size, essential for wireless file transfers.
As a point of interest, the visual differences
between the two formats are quite marked. The colour, much
formatting and images have all been removed, however notice how the body
of the document has been preserved and the hyperlinks referenced in
WML.
Note also how the branching ancestry of an
<a> node has been cut down during the translation, exactly how it
should be when conforming to the WML specification.
Although the hardware limitations of PDAs such as
their small processing ability, memory, and compact interface force WML to
be used, the software presented makes maximum use of it's environment to
display as much permissible data as possible.
Essentially all the text information contained in
the document has been preserved and so we say this test was successful.
More examples are discussed in section 6 of the appendix.
We now turn to look at an example where the
software failed so that we can make a constructive conclusion on the scope
of our product.
The code below shows how problems came when
unexpected tags were encountered. Although there was a template to deal
with the <td> tag on line 3 of figure6.9, there was no mechanism for
when it took the attributes shown below. <tr>
<td width="883">
<p align="center">
<small>
<font face="Arial"> First Consultants UK Ltd </font></small></p>
</td>
</tr>
<…
Figure6.9
HTML input
The result was confusion when tags like this were
parents to others. Frequently this meant instantiating a <p>
node with parent::p in our abstract syntax tree i.e.: ...>
</p>
<p>
<p>First Consultants UK Ltd</p>
<...
Figure6.10
WML result
This did not constitute valid WML.
The majority of failings came when the parsed
character data consisted of JavaScript operators and constructs. There was
also a bug when processing tag attributes that too generated a<p>
tag with child<p>.
Modifying the template design to accommodate more
types of ambiguous input would have involved a significant amount of
coding and had to be left as a bug in the software.
Having completed testing, the results enabled me
to make conclusions on how the software could be improved which we will
now discuss.
Improvements that I would like to see done for
future releases would include a better way of handling tables. WML
does support tabular formatting albeit very constricted given its hardware
interface. It would be interesting to see to what extent this could
be used to show HTML tables.
Expanding the XSLT 'dictionary' would also be an
easy modification enabling it to cope with more tags than present.
This would improve the scope of the software. Taking this other
step, research could be done into converting graphics into wireless
bitmaps. Some good conversion programs are already available. Gingco's implementation I know can be
supported over Java Servlets running on Apache. Anyway a good place to
start for anyone interested is the Collaborative Computational Project.
Ideas for more major pieces of work, possibly a
v1.0 release are discussed in the final chapter.
I now present the results of the
evaluation.
 Figure 6.11 Test data evaluation results, with failed
transformations shown as an 'X' mark.
This can be represented graphically:
Figure 6.12 Graphical summary of the test
data evaluation results.
As the results show, there was a successful
transformation of approximately three-quarters of all data. The
reasons for some of the failed transformations were quite unexpected and
interesting, since it showed limitations and demonstrated the scope of the
software, which helps to draw the conclusions we will now
discuss.
It is evident we have seen a successful solution
to the problem of static HTML to WML translation.
By producing valid well-formed WML and with a
success rate of at least 70%, the software has also fulfilled its
objectives in handling many of the complex constructs found in HTML.
Most importantly these have included routines for handling hyperlinks and
presenting framed pages. The implementation of this was only through
exhaustive efforts made in research and testing.
XDNL
In discussion it is worth mentioning another
similar XML based language that offers a hybrid of XSLT and XPath type
controls. The problem with XSLT is it has no regard for navigation
or interaction. In this project we have concentrated on finessing a "look"
from HTML to WML for which it worked well. But if you are delivering
output to a family of devices with fundamentally differing form such as a
WAP handset and a PC Web Browser, navigation is an issue you might like to
consider.
XDNL, The XML Document Navigation Language
proposes using XML to encode the basic content and an XDNL document to
describe the navigation for different devices. The principle is to
write a navigation description for each family of target devices, which is
used at runtime on a server to only serve the appropriate section of the
original content document. Thus certain XHTML elements can be
selectively excluded from processing altogether.
I believe that we have however come up with a
solution that works without the need for XDNL, something that I see as
more of a tool for Website managers. In time I believe WMLs badly
written protocol that pretty much breaks every relevant protocol standard will be superseded by
developments in hardware and networks such as UMTS, but for those
interested please read W3C's XDNL specification. There is also currently
discussion on the migration from WML to XHTML or CHTML, a low bandwidth
HTML hybrid. This is accompanied by the introduction of HDML (Handheld
Device Markup Language), the discussion of which I have put in
appendix7.1.
7.1 Conclusion
Although our implementation was not successful at
converting all HTML documents, the difficulties overcome and the
modifications made to the original design have produced a solid base
solution to the problem.
In fact most of the remaining bugs in the software
are caused not so much by shortcomings in design but flaws in previous
code integrated with this project.
Undoubtedly the XHTML dictionary could be expanded
to harbour more templates, and the modular design of this software coupled
with the well commented accompanying documentation means this would be
easy to implement.
I would also suggest the creation of a conditional
check that user input was of the type *.htm this would prevent
invalid filename input or could possibly present the user with an
equivalent document type translation package, such the image to wireless
bitmap conversion mentioned previously.
Since starting work on this project I have learnt
a great deal about the subject and have used this to advance others'
knowledge in the area. People working on similar problems as to how
the concepts I have used can be improved or further applied frequently
contact me. Some of these correspondences are shown as appendices and the result of
discussion with others in the field has culminated in my decision to seek
a public licence for the software and make it and this accompanying
documentation publicly available.
7.2 Further work
A topic for further work would be to make the
software operate dynamically, producing documents on the fly. As
mentioned previous, this would require implementation on a web server,
which would allow users much more functionality.
Some work has been done into researching how this
would be possible, and we will now briefly discuss a possible
implementation that could form the basis for a future piece of work.
By wrapping the software up through a WAP gateway,
we could add interrogating and interactive scripts that could return and
relay results to users offering much more functionality than
present. This could include a script to strip the WML
hyperlinks (currently saved as comments) out and into a second WML card
that passed the string back as input to the translation program should the
user request the said URL.
Unfortunately as explained WML Script does not
have this power but using Active Server Pages over an IIS or Apache server
it should be possible to perform such operations.
By using a gateway to handle the code
transmission, we can extend the functionality of WML without the browser
even knowing what was happening.
I present the following ASP file that could be
used to get and pass additional URL requests:
<<%@ Language=VBScript%>
<% Response.ContentType =
"text/vnd.wap.wml"%>
<?xml version="1.0"?> <!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML
1.2//EN" "http://www.wapforum.org/DTD/wml_1.2.xml"> <wml> <card id="card1" title="HTML 2 WML"> <do type="accept" label="Submit"> <go href="get.asp?url=$(url)l"/> </do> <p> URL: <input name="url" value=""/> <br/> </p> </card> </wml>
Our get.asp file would then take the URL and find
the relevant HTML document before processing it and returning its
WMLequivalent.
I leave this as a little taster for those
interested in finding out more.
It might also be an interesting exercise to
evaluate whether a similar implementation might be made for handling other
types of document such as PostScript. This could be integrated together
with similar software to produce a comprehensive package of Web
translation tools.
7.3 Summary
In this report we have presented a unique solution
to an original problem using established scientific concepts.
Following discussion with professionals in the field the software has been
put under GNU Public Licence and at the time of writing a summary paper is
awaiting publication. For more information or updates please visit
http://www.paulhoward.co.uk or see the following appendices.
1.1
The Wireless Application Protocol is a
specification for wireless data communications using hand-held devices
such as mobile phones and palmtop computers. Use of the WAP specification
allows mobile devices to communicate with the Internet or an intranet,
providing the users of these devices with mobile data communications
capabilities such as web browsing and e-mail.
The WAP Forum, an industry association of wireless
device manufacturers, service providers, and software companies, developed
the WAP specification. Further information about the WAP Forum can be
found at its website at www.wapforum.org.
Essentially the protocol is set to fail. Contrary
to traditional protocols, WAP is not an engineering construct, more a
business one. Open development and freedom of usage is restricted by the
strict and high monetary cost of entry to its supposedly 'open' forum.
WAP has claimed center stage not because it
fulfils the needs of the industry, but because thus far, no viable
alternative has been presented. However alternatives such as LEAP do exist
and it is my opinion as already mentioned that improvements in hardware
and alike will lead to its demise and an acceptance of better more
conventional languages made possible through technological
advancement.
1.2
http://www.devguru.com/Technologies/wml/quickref/wml_hierarchy.html
There is an excellent WML tree representation at
this URL, I would have liked to include it as an attachment only its size
and complicated HTML layout means reinterpreting it here is not possible,
but the document is worth looking at to demonstrate WMLs simple nesting
hierarchy.
4.1
Opening page source code from figure4.1: <?xml version="1.0"?> <!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.2//EN"
"http://www.wapforum.org/DTD/wml_1.2.xml"> <wml>
<card id="card1" title="HTML 2 WML">
<do type="accept" label="Submit">
<go href="page1.wmls#passURL($(URL))"/>
</do> <p>
Enter URL: <input type="text" name="URL"/>
</p> </card> </wml>
4.2

Template pseudo-code:
 4.4
XSLT representation of the template design, note
how it has been structured similar to anticipated input, with the html and
head templates towards the top, followed by the headings etc.

6.2
Here are the URLs for all the test data used in
the evaluation:
http://www.tesarta.com/
http://sports.nfl.com/2000/newsnotes?team=20
http://music.recycler.com/
http://www.thewebsters.com/
http://www.juggling.org/
http://www.treasurenet.com/
http://www.poets.org/
http://www.theupperdeck.com/petpage/
http://www.edithwharton.org/
http://www.kevdo.com/lipbalm/
http://www.geocities.com/thetropics/cabana/5807/
http://www.cs.nott.ac.uk/
http://www.paulhoward.co.uk/cv.htm
http://www.cs.bris.ac.uk/~mm7323/MainBot/
http://www.w3.org/People/Raggett/tidy/
http://www.google.com
http://www.nothing.com
http://www.random.com/full/
http://news.bbc.co.uk/hi/english/sci/tech/default.stm
http://www.handsoffmy.org/
http://www.icann.org/
http://www.techweb.com/encyclopedia/
http://www.bloomberg.com/bbn/windex.html
http://s1.amazon.com/exec/varzea/subst/fx/help/how-we-know.html/058-5282950-4123910
http://www.dictionary.com/wordoftheday/?from=nothing
http://www.gimp.org/the_gimp_about.html
http://www.dtic.mil/armylink/faq/index.html
http://www.helm.com/document.htm
http://www.fedworld.gov/jobs/jobsearch.html
http://www.adobe.com/products/smcoll/main.html
http://www.hqda.army.mil/ogc/eandf.htm
http://www.fcuk.com/first_consultants_uk_ltd.htm
http://www.boselecta.com/
http://shopping.infospacehosting.net/exitpage/index_hm.html
http://www.redhat.com/apps/commerce/order_history.html
http://www.rinsing.com/
http://www.oreilly.com/catalog/javaxml/
http://www.activestate.com/ASPN/Downloads/ActivePython/More
http://www.drumnbassarena.com/djpics/djrisky.html
http://www.drumnbassarena.com/news/
http://www.azuli.com/
http://www.upsideweb.com/resources/designanddev.htm
http://developer.openwave.com/index.html
7.1
I refer to 3G Mobile Vol.3, Number 5. March7 2001.
A discussion on how Japan's DoCoMo are expecting Compact-HTML (CHTML) to
provide the next generation of XML compliant meta languages.
Meanwhile Phone.com are the sole controllers of the HDML specification,
which may hold back its development, which promises few features not
already available to WAP developers. In short, by the time either
language has had the opportunity for growth seen on the scale of HTML, the
hardware devices will not be restricted to using these low bandwidth
languages.
7.2
Some of the correspondence I have had regarding
publication and licensing of the software and my report:
> -----Original Message----- > From: Paul Howard <prh98c@cs.nott.ac.uk>
ئnbsp; Sent:
Tuesday, April 17, 2001 8:59 PM > To: wap-dev@AnywhereYouGo.com > Subject: Re: [WAP-dev] Converter HTML TO WML > > >Hi, > > > > I am a
computer science student working on a project designed to translate > > HTML to WML. >
> > > My implementation is something like
this: > > > >
HTML>>Tidy.exe>>xHTML>>Saxon.exe + XSLT>>WML > > > > If you
have any specific questions feel free to email me. > > > > Paul. > > > >
================================ > > School
of Computer Science & IT > > University
of Nottingham > > Nottingham, UK > > > > http://www.paulhoward.co.uk > > ================================ > >
_______________________________________________ > > WAP-dev@AnywhereYouGo.com > > To unsubscribe, or change subscription
options: > > http://mail.AnywhereYouGo.com/mailman/listinfo/wap-dev > >
---- Original Message -----
> Hi Paul, >
> I am interested in conversion from HTML to
XHTML , For this > conversion Tidy.exe(which is
a converter, i suppose) may be using some rule > base to convert or filter the contents in HTML .
Could you please elaborate > with some example
how u r defining the contents. > > Regards, > Nilesh >
----- Original Message -----
> > > > Hi there, > > Hi. > > > > > I hope
you dont mind me writing but I have been working on a project to > > generate WML pages on request from HTML
pages. > > No, not
at all. > > > > > I have developed a comprehensive XSLT sheet
which in conjunction with Tidy > > and Saxon
can take a wide variety of HTML pages and through a little batch > > file DOS command generates the WML version,
handling frames, tables, text > > and many
of HTML's other bizarre tags. ie. > > > > D:\Project\html2wml>exec.bat foo.htm > > > > Generates
foo.wml which is all very nice. > > That sounds quite impressive. I When I was doing
my ASP/HTML/WML > code I didn't have the
wherewithal to create one myself, and > could
not find one out on the web. > > > > > > > I am really eager to see if I can get this
to run on my desktop simply by > > inputting
the html filename into a WML page on my WAP emulator. > > > > Id have
hoped the WML page would have passed the filename to and executed > > exec.bat then provided a link to the
generated WML page, but this > > is
proving > > tricky given WMLS restricted
library. > > I'm not
sure I follow this: Give an HTML url to a WML page, > and the server fetches the HTML, converts to a
WML file, and then > renders a WML page,
showing the link to this new WML page on the server? > > > > > I noticed you had done a similar
implementation using ASP only as far as I >
> am aware .asp pages aren't recognized by WAP emulators, so when I
tested > > your default.asp code (from the
TopXML html>wml site) it refused to support >
> Content of Type text/asp. Is this because I am not using it over a
WAP > > gateway or what? > > You can have ASP
emit any HTTP header you like. By default it should be > sending "text/html", but there is an ASP
directive to alter this: > Response.ContentType
= "text/wml"; // or whatever you like > > > > > > I would be really grateful and interested
if you could spare a few minutes > > to tell
me how your suggested implementation would work because > > as far as I >
> am aware, when doing a go href in WML the URL is simply passed back
to the > > browser to handle which it
obviously cannot (seems not to) do. > > Well, when a user clicks a link, such as http://www.foo.com/SomePage.wml the > browser makes an HTTP GET request to www.foo.com, and
requests file > SomePage.wml. > It is up to the server (www.foo.com) to know
how to process and return the > requested > file. For example, if you request foo.asp from
an IIS machine, the web > server > knows that any file ending in .asp needs to be
pre-processed before the > HTML is sent back to
the browser. > >
There is as trick you can do with IIS (and any other decent web server),
and > that is to define special file type
associations. For example, you can tell > IIS
that any file ending in .wml should be processed by asp.dll (just as it > does for files ending in .asp). Then, you can
write your ASP code, save > them in files lie
foo.wml, and when a request is made, IIS treats it like > any > other .asp
file. > > > > > Both your implementation and mine as far as
I can see rely on the > > processing > > being done outside of the browser and I
can't see (unless through > > the use of > > a gateway <which I know v.little
about!>) how this can be possible. > > It's possible because requesting a file from a
web server is similar to > calling a function.
Requesting "foo.asp" is sort of like calling foo(), > and what happens inside foo() is really just the
server-side script > in foo.asp. The browser
(whether IE5, or some WAP phone) expects HTML or > WML; > how that gets
created is up to the web server. The web server knows > what to do by looking at the file requested, and
processing it according > to its file-type
association. > > The
browser knows nothing of how the page is created, and only > cares that it gets a known content type it can
render. > > A WAP
device can (as far as I know) request any URL. It can call > somepage.asp, and somepage.asp can go do
whatever it needs to. > Untimely, somepage.asp
needs to do two things: emit the correct >
content-type header appropriate for the browser, and emit the > correct markup. For example, a single ASP could
detect the browser type, > and emit either WML
or HTML based on the UserAgent header. > > My code attempted to have the server fetch an
HTML, convert the HTML to > WML, and return the
WML directly to the browser. It had several flaws, > most notably the absence of any way to break
large HTML pages into > multiple smaller WML
pages. > > I would
think your process would go something like this: > > WAP device submits
a form specifying an HTML page to fetch: > http://someserver.com/getPage.wml?url=ww.someplace.com/somepage.html > > getPage.wml is
really ASP, and IIS is set up to processes wml files > as asp files. Inside getPage.wml, the code: > > Grabs the URL of
the requested HTML file > Retrieves that file
and writes the contents to disk > Calls the
batch script to process the HTML file into WML. > Generates WML (containing a link to the new WML
file) and sends > it back: > > // Get the HTML
page and save it to savedHtmlFileName, > //
then processes it ... > > var newLink =
processHtmlFile(savedHtmlFileName); >
Response.ContentType = "text/wml"; >
Response.Write("<?xml version='1.0'?>"); > Response.Write("<!DOCTYPE wml PUBLIC
'-//WAPFORUM//DTD WML 1.1//EN "); >
Response.Write("http://www.wapforum.org/DTD/wml_1.1.xml'>"); > Response.Write("<wml>"); > Response.Write("<card id='card1'
title='Fetch>"); > Response.Write("<do
type='accept' label='Link'>"); >
Response.Write(" <go href='" + newLink + "' />"); > Response.Write("</do>"); > Response.Write("</card>"); > > (Might not be
exactly right WML) > > > > > > In any case if you would be interested in
including or looking at my XSLT > > I'd be
really pleased to make it available to anyone who might find it > > useful. Some of the implementations I have
seen out there are > > really poor! > > I would very much
like to see the XSLT. If you are willing to make it > publicly available, I would recommend submitting
it to TopXML.com. > I would also recommend
adding appropriate copyrights to the file > to
protect your interests, or to at least keep others from
misappropriating > it. > (At the least, consider putting it under the GNU
public license.) > >
> > > Many thanks for your time and I
hope to hear from you soon. > > > Hope this helps. I
believe there is a VB/WML mailing list > hosted
by the TopXML folks, and they have online forums, too. > > > > > Kind regards, >
>Paul. > > >
Take care, > James > > > > > ========================== > > School of Computer Science & IT > > University of Nottingham > > Nottingham, UK >
> ========================== > > >
----- Original Message -----
> Paul, > > I recently received
some e-mail from someone who wondered > if I or
anyone else had completed writing the XSLT to transform > HTML to WML. > > I told him that in fact I had just heard from
someone who was doing it. > > I was asked if I could pass on your e-mail
address, or the XSLT. > Of course, I don't want
to gave out any details without talking you >
first, and I don't want to put you on the spot or anything. > > You mentioned
making the XSLT available; have you decided on how you > would like to do that? > > Thanks, > > James > >
API Application Programming Interface a language and
message format used by an application program to communicate with the
operating system or some other system or control program such as a
database management system or communications protocol
Software and Source Code
Get the source code!
The html2wml.zip
contains all the bits needed for static HTML to WML translation, I have
included Saxon and Tidy.exe
The documentation.zip
is a 30 page pdf (+20page appendix) covering most of the details of my
implementation.
The
software is bound under the 2001 Copyright terms of GNU Public Licence.
Please report any errors to project@paulhoward.co.uk.
| Page 2 of 2
|
|
|