Blogger :
Ajaxian Blog
All posts :
All posts by Ajaxian Blog
Category :
XML
Blogged date : 2008 May 05
John must have had some downtime on Sunday afternoon, as he implemented an HTML parser in JavaScript. The library, that you can play with via this demo, lets you attack HTML in a few ways:
A SAX-style API
Handles tag, text, and comments with callbacks. For example, let's say you wanted to implement a simple HTML to XML serialization scheme - you could do so using the following:
JAVASCRIPT:
-
-
var results = " ";
-
-
HTMLParser
("hello world"
,
{
-
start: function( tag, attrs, unary ) {
-
results += "<" + tag;
-
-
-
results += " " + attrs[i].name + '="' + attrs[i].escaped + '"';
-
-
results += (unary ? "/" : " ") + ">";
-
},
-
end: function( tag ) {
-
results += " ";
-
},
-
chars: function( text ) {
-
results += text;
-
},
-
comment: function( text ) {
-
results += "";
-
}
-
});
-
-
-
XML Serializer
Now, there's no need to worry about implementing the above, since it's included directly in the library, as well. Just feed in HTML and it spits back an XML string.
DOM Builder
If you're using the HTML parser to inject into an existing DOM document (or within an existing DOM element) then htmlparser.js provides a simple method for handling that:
JAVASCRIPT:
-
-
// The following is appended into the document body
-
HTMLtoDOM
("Hello World"
, document
)
-
-
// The follow is appended into the specified element
-
HTMLtoDOM
("Hello World"
, document.
getElementById("test"))
-
DOM Document Creator
This is a more-advanced version of the DOM builder - it includes logic for handling the overall structure of a web page, returning a new DOM document.
A couple points are enforced by this method:
- There will always be a html, head, body, and title element.
- There will only be one html, head, body, and title element (if the user specifies more, then will be moved to the appropriate locations and merged).
- link and base elements are forced into the head.
You would use the method like so:
JAVASCRIPT:
-
-
var dom = HTMLtoDOM
("Data: "
);
-
dom.getElementsByTagName("body").length == 1
-
dom.getElementsByTagName("p").length == 1
-
One place that you could use this API would be on the server-side. For example, using Aptana Jaxer. Although, you could also interface directly to Java, or just use the Mozilla utilities directly.