BizTalk Utilities CV ,   Jobs ,   Code library
 
Go to the front page to continue learning about XML or select below:

Contents

ReBlogger Contents

Previous posts in XML

 
 
Page 3607 of 19626

HTML Parser in JavaScript

Blogger : Ajaxian Blog
All posts : All posts by Ajaxian Blog
Category : XML
Blogged date : 2008 May 05

John must have had some downtime on Sunday afternoon, as he implemented an HTML parser in JavaScript. The library, that you can play with via this demo, lets you attack HTML in a few ways:

A SAX-style API

Handles tag, text, and comments with callbacks. For example, let's say you wanted to implement a simple HTML to XML serialization scheme - you could do so using the following:

XML Serializer

Now, there's no need to worry about implementing the above, since it's included directly in the library, as well. Just feed in HTML and it spits back an XML string.

JAVASCRIPT:
  1.  
  2. var results = HTMLtoXML("

    Data: ")

  3. results == "

    Data: "/>

    "
  4.  

DOM Builder

If you're using the HTML parser to inject into an existing DOM document (or within an existing DOM element) then htmlparser.js provides a simple method for handling that:

JAVASCRIPT:
  1.  
  2. // The following is appended into the document body
  3. HTMLtoDOM("

    Hello World", document)

  4.  
  5. // The follow is appended into the specified element
  6. HTMLtoDOM("

    Hello World", document.getElementById("test"))

  7.  

DOM Document Creator

This is a more-advanced version of the DOM builder - it includes logic for handling the overall structure of a web page, returning a new DOM document.

A couple points are enforced by this method:

  • There will always be a html, head, body, and title element.
  • There will only be one html, head, body, and title element (if the user specifies more, then will be moved to the appropriate locations and merged).
  • link and base elements are forced into the head.

You would use the method like so:

JAVASCRIPT:
  1.  
  2. var dom = HTMLtoDOM("

    Data: ");

  3. dom.getElementsByTagName("body").length == 1
  4. dom.getElementsByTagName("p").length == 1
  5.  

One place that you could use this API would be on the server-side. For example, using Aptana Jaxer. Although, you could also interface directly to Java, or just use the Mozilla utilities directly.


Read comments or post a reply to : HTML Parser in JavaScript
Page 3607 of 19626

Newest posts
 

    Email TopXML