BizTalk Utilities CV ,   Jobs ,   Code library  
 
Interactive Java & XML online tutorial
Information on James Clark's XT (SAX parser)
A list of parsers
Java and XSLT: XSLT Processing with Java
Programming Web Services with XML-RPC: Client-Server Communication
Java and XML: SOAP
30 page javax.xml package class reference
The org.w3c.dom.ProcessingInstruction Interface
The org.xml.sax.XMLReader Interface
org.xml.sax.XMLFilter Interface
org.xml.sax.Locator Interface
org.xml.sax.HandlerBase Handler
org.xml.sax.ext.LexicalHandler Handler
org.xml.sax.ext.DeclHandler Handler
Overview of Content Handlers
Overview of DTD Handlers
java.xml.sax.InputSource Class
org.xml.sax.helpers.XMLReaderFactory Class
javax.xml.transform.URIResolver Class
javax.xml.transform.TransformerFactory Class
<< XSLT
.NET and XML >>

By :Mark Wilson
I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
First posted :09/26/2001
Times viewed :327

 
Page 1 of 3

 

  Next Page
  

XML Tools for Information Appliances

Copyright and Authorship Notice

This chapter was written by Eric Jung and it is taken from "Java XML Programmer’s Reference" by Eric Jung, Andrei Cioroianu, Dave Writz, Mohammad Akif, Steven Brodhead, James Hart published by Wrox Press Limited in July 2001; ISBN 1861005202; copyright © Wrox Press Limited 2001; all rights reserved.

No part of this chapter may be reproduced, stored in a retrieval system or transmitted in any form or by any means -- electronic, electrostatic, mechanical, photocopying, recording or otherwise -- without the prior written permission of the publisher, except in the case of brief quotations embodied in critical articles or reviews.

This chapter focuses on Java, XML, and XSLT technologies for lightweight clients. Lightweight clients are defined as those with more limited resources than traditional clients. The term information appliance is used interchangeably with lightweight client. The obvious devices that fit within this category are personal digital assistants (PDAs), mobile phones, and pagers. However, many other embedded devices and consumer electronics may fit into this category: television set-top boxes, global positioning system (GPS) receivers, thermostats, watches, digital cameras, even Internet appliances such as kitchen stoves, refrigerators, and radios. Non-consumer-oriented devices also can fit in this category, such as industrial automation and control sensors.

However, this definition is not limited to non-PC devices. Any environment that requires the following should qualify:

  • A small memory footprint
  • Limited CPU overhead or availability
  • Restricted network bandwidth

Applets, in a typical browser, also fit this lightweight category. As we shall see, XML generation, parsing and transformation are just as important for these types of clients as they are for thin clients (browsers) and servers.

In this chapter, we will address three key Java XML technologies for lightweight clients:

  • Lightweight parsers and document generators - Reference material for three parsers and document generators. Through examples, we will demonstrate the usage of three lightweight XML parsers, two of which can also generate documents by enabling you to create a DOM-style node tree.
  • XSLT compiler - a Java tool that creates fast and lightweight Java class files for transforming XML given an XSL stylesheet
  • CLDC (Connected Limited Device Configuration) and the Java KVM (Kilobyte Virtual Machine) - a Java specification for limited devices, which includes a reference implementation written by Sun. The Java KVM, a virtual machine redesigned for the constraints of limited devices, is part of that implementation.

We can do the same essential processing, parsing, and transforming tasks with these tools that we have used elsewhere in the book. Examples in the chapter will show how to work around some of the limitations of these tools, and how we can leverage their small size to get them to run on devices that would not support their heavyweight counterparts.

There are a number of acronyms and terms you'll encounter in the following sections, so let's briefly cover some terminology before we continue:

Term

Definition

CDC

Connected Device Configuration - defines a base set of I/O, connectivity, and other classes for "heavy" lightweight clients such as set-top boxes and audio/visual equipment

CLDC

Connected Limited Device Configuration - defines a base set of I/O, connectivity, and other classes for lightweight clients such as pagers

J2ME

Java2 Platform, Micro Edition - the Java2 platform for information appliances (lightweight clients)

Java KVM

Java Kilobyte Virtual Machine - a Java virtual machine designed to minimize its memory footprint instead of maximizing its speed. Currently ported and compiled for Linux, Solaris, Windows, and Palm OS

PDA

Personal Digital Assistant - a digital organizer, consisting of applications such as an address book, date book, and notepad

This chapter starts by discussing why we should consider using XML on lightweight clients. We introduce the Java 2 Platform, Micro Edition (J2ME) and its architecture. Then, we cover three lightweight XML parsers and the XSLT Compiler. Finally, we conclude with a Palm OS application that beams address book entries in XML format from one Palm device to another using the Java KVM.

Any discussion of using XML on lightweight clients typically leads to solutions that do not adhere well to standards. W3C XML-related recommendations and standards usually don't have lightweight clients in mind, as they are written independently of any platform or operating system. The implementation of these recommendations often involves resource-intensive processing not possible on lightweight clients. Therefore, many W3C recommendations, for instance namespaces and DOM, are not supported, in order to keep library sizes down.

Instead of looking at the XML components covered in this chapter as non-standard, I encourage you to view them as you might have viewed tools of the early World Wide Web: useful, but non-standard and non-standardized. This will change with time, as we can see by the recent conglomeration of multiple lightweight Java initiatives into the far-reaching J2ME.

Lightweight Client Support for XML

Lightweight client support for XML has largely been ignored in the XML revolution. This may be a reflection of the role of the client-side developer. Traditional client-side developers have all but disappeared from many contemporary web application developments, ever since n-tier architecture has displaced the client-server paradigm in the enterprise.

Therefore, if the developers themselves have slimmed down in numbers, the tools they use are bound to become less common. The lack of these tools is also perhaps a construct of what today seems a predominantly server-dominated industry. Perhaps they are lacking because of the ease with which web applications enable developers to forget about them.

Prevailing attitudes can be summed up this way: "Anyone who understands my DTD or XML Schema, can display this document class as they please." Even server-side developers churning out WML today are probably still treating WML as yet another document class that their server application needs to support.

But the number of document classes being published by enterprises grows every day. Servers that produce one or more of these myriad of XML formats (WML being one of them) suddenly complicate things on the lightweight client where slick heavyweight browsers with ActiveX controls don't exist. Lightweight clients simply don't have the capabilities and resources available to browser environments.

The role of the lightweight client-side developer has now been boosted to that of browser developer or "XML processor" developer. More generally, the client-side developer now has a rejuvenated role as a Java XML developer in the world of lightweights, especially with the success of the Java 2 Platform, Micro Edition (see J2ME, page 568). In the future, if the modular and lightweight XHTML Basic (see Too Many Client Formats, page 566) becomes popular and natively supported by vendors, client-side development may be relegated back to that of scripting with most work done on the server-side. However, we have not yet reached that point.

Most contemporary discussions about XML technologies focus around server-side issues, such as document generation from a relational database, document parsing and persistence to a data store, document transmission, or document transformation for an anticipated client (such as a Compact HTML browser). When client-side issues are addressed, they are often limited to Microsoft Internet Explorer or Netscape Communicator.

Case in point: Microsoft has substantive support for XML with MSXML in Internet Explorer. The latest release of MSXML, 3.0, supports:

  • XSL Transformations (XSLT)
  • XML Path Language (XPath)
  • XML Namespaces 1.0
  • DOM
  • SAX 2.0
  • Organization for the Advancement of Structural Information Standards (OASIS) XML 1.0
    test suite
  • Secure server-to-server XML with HTTPS

This is impressive, but these services are implemented as ActiveX components intended for use within Internet Explorer on the client, or as ASP pages on the server. There are plenty of clients that don't support ActiveX components. I doubt that Microsoft's own UltimateTV and Xbox, two "heavy" lightweight clients, can make use of MSXML. UltimateTV is essentially a digital VCR that can record two channels simultaneously (similar to Tivo in the United States), and Xbox is their Sony Playstation-style games unit. Other consumer-oriented and embedded devices would have similar problems with MSXML. As a side note, Sony has announced that they will integrate Java technologies into the Playstation by the end of 2001 (see http://www.javasoft.com/features/2001/06/sony.html)

So, what do we do if we need to parse, generate, or transform XML on a lightweight client? Do we even need to do this at all?

The Need for XML On Lightweights

The future will show that, as Java and XML developers, we must pay more attention to XML technologies on lightweight clients. Even server-side-only developers, who today often just transform their XML into a subset of (X)HTML supported by the most common browser, will have to change their approach.

There are at least five reasons why you need or will need to parse, process, generate, and transform XML on lightweight clients. We will go into detail on each one of these:

  • Lightweight client-side development. If you're a lightweight client-side developer that will be receiving content from providers who publish XML, you will need a way to parse and process XML documents of their document class. You may also need to generate XML documents to send back to the provider
  • Too many client formats. If you're a lightweight client-side developer and your client is going to receive content from multiple providers, each of whom publish XML using different schemas (very likely given today's state of affairs!), you may want to transform those documents into a generic document class before processing them. As a server-side developer, you may want to publish your content in one form, instead of trying to keep up with all the latest standards and recommendations, and push the burden of transformation to the client
  • Peer-to-Peer networking. If your lightweight client application is part of a peer-to-peer network or you are designing a peer-to-peer network, you may want to communicate with the other clients in the network through XML
  • Information appliance interoperability. Embedded devices, smart consumer electronics, PDAs, mobile phones, and Internet appliances can all interoperate with each other and the Internet using technologies such as Bluetooth, Jini, Ricochet, CDPD, GSM and GPRS, WiFi (802.11b), and HomeRF. The need for common data exchange formats grows as the interoperability of these devices grows. Even if the underlying communications mechanisms are black boxes, the application developer is presented with new opportunities and challenges, as he now has a multitude of information appliances connected that previously weren't
  • Powerful lightweights. If you extrapolate Moore's Law, we'll all eventually have turbo-charged mobile phones and PDAs at a cost too cheap to ignore. We could use some of that power for XML-related and XSLT tasks

Lightweight Client-Side Development

With the Java 2 Platform Micro Edition now a reality on PDAs, embedded devices, and actually shipping on some mobile phones, the possibility for rich Java applications with network connectivity on lightweight-clients is here.

All the lessons learned from the client/server days before XML are not forgotten simply because we're on a constrained device. If we want to leverage the benefits afforded by XML (which have been addressed by many other books and articles but is outside the scope of this chapter) on modern lightweight clients, we'll need a way to parse and process XML documents delivered to us by servers. From there, we can display the data to the user and/or store it locally.

We might then wait for user input, or gather system information, and package it up into an XML document for transfer to a server. For example, we might query the current price of an item on our auction web site.

Too Many Client Formats

Today there are numerous lightweight-client document classes. Here is a partial list:

  • Compact HTML (cHTML) is used in the Japanese NTT-DoCoMo i-mode network (http://www.w3.org/TR/1998/NOTE-compactHTML-19980209)
  • Wireless Markup Language and WMLScript for WAP networks
  • Web clipping applications for Palm.Net and OmniSky networks
  • HTML, and even though most agree it is inappropriate for information appliances, most web sites are still publishing their content only in this format. There's even the HTML 4.0 Guidelines for Mobile Access (http://www.w3.org/TR/1999/NOTE-html40-mobile-19990315), which describes what parts of HTML should be avoided for information appliances
  • Handheld Device Markup Language (HDML), originally created by Unwired Planet (Phone.com /Openwave - http://www.openwave.com) in 1995 and submitted to the W3C in 1997. It is not XML-compliant, nor does it have scripting capabilities as with WML's WMLScript (however, the Openwave WAP Edition browser does display WML/WMLScript as well as HDML, while its Universal Edition browser displays WML/WMLScript, xHTML, and cHTML)
  • Proprietary formats, which may not even be XML, such as those for the Xircom Rex
  • XHTML Basic, a W3C recommendation for a common (yet modular) information appliance document type. The recommendation can be found at http://www.w3.org/TR/2000/REC-xhtml-basic-20001219

This list isn't intended to be complete. It demonstrates the alphabet soup of lightweight client document classes. XHTML Basic is an attempt to rein in this rabble. It defines a common base that includes images, forms, basic tables, and object support. It is intended for web clients that cannot or do not support full XHTML or HTML 4.0, and can be extended through modules (see Modularization of XHTML at http://www.w3.org/TR/2000/CR-xhtml-modularization-20001020).

However, the verdict is out on whether or not XHTML Basic will be widely adopted: it was only officially made a recommendation in December of 2000. Even if it does become widely adopted, the recommendation seems inherently "user-interface-centric". The introduction states, "Because there are many ways to subset HTML " By "user-interface-centric" I mean that data-driven applications, such as some of those found in industrial control, probably care nothing about subsetting HTML. They may not benefit as much by using XHTML Basic as, for example, mobile phone applications. However, even if XHTML Basic takes off and solves the multiple document class problem for user-interface-driven applications, there will still be browser developers on lightweight clients who will need to parse, process, and generate XHTML Basic.

In the meantime, we can make a generic J2ME client that understands all of these formats, or as many of our custom formats as we like. By transforming each of these formats into our own document class before processing, we could reduce the size of lightweight client code significantly. You can use a tool like XSLTC (see XSLT Compiler, page 605) to do this. Then, you can parse and process the transformed XML with one of the parsers reviewed in this chapter or the parser that comes with XSLTC. You might also need to use XSL on the client-side if you are displaying to the user a single, integrated service which is actually comprised of multiple smaller services from different servers, each publishing content in a different document class.

But ultimately, why should the document provider care about what kind of client he talks? He should publish his XML with a DTD or schema, and leave the rest to the client. This follows the lessons of encapsulation and distributed object-oriented design that we've learned as a development community over the years, even if it goes against the popular notion of "thin clients".

Use XML Document Servers

Imagine trying to publish content and data in HTML, cHTML, WML and WMLScript, HDML, and in the Web Clipping Application format. No problem, you say: we store all our data natively as XML. All we have to do is write XSL transformations for each format and expose addresses where each content type can be reached.

That could be a lot of work to reach all the new devices or networks, especially as new document classes are popping up all the time and old ones are dying out so we'll have to keep on writing new XSL transforms. Here is a case in point: if WML/WMLScript overtakes the older HDML format in the US, it might spell doom for thousands of existing HDML applications. Fortunately, WAP gateways transform HDML into WML - but relying on infrastructure providers for upgrade paths is dangerous.

Instead of publishing content data in one of the formats we talked about above (such as WML), we should consider publishing it to the client in XML with an associated DTD or XML Schema. As discussed previously, the transformation process rightfully belongs to the client.

Peer-to-Peer Networks

Client-to-client networks, like Napster and Jabber (which have centralized directory services) or GnuTella (with no centralized directory service), have yet to explode in the information appliance world. Jini technology and Project JXTA (http://www.jxta.org) are addressing them. Jini is a mechanism for connecting distributed services in a network using a directory service. Project JXTA is a mechanism for connecting distributed services in a peer-to-peer (P2P) network where no directory server exists. Additionally, data (called codat, to indicate anything from code, data, or applications, to text, images, serialized Java objects, or SOAP packets) is sent across JXTA pipes as XML.

JXTA is quite new, so there aren't many applications out there using it yet. However, it does come with a graphical application called InstantP2P. InstantP2P implements:

  • Instant messaging within "peer groups"
  • P2P file sharing

Peer groups are collections of peers that publish, limit, and control access to codat among other peers in the group. In addition, each peer group defines its own membership requirements to secure peer group membership.

The lack of widespread use of P2P networks might be partially due to the single-threaded operating systems that many information appliances employ. If you want to share applications or MP3s on your Palm OS device over a P2P network, for example, you won't be able to look up your wife's phone number at the same time.

Whatever the reason for the lack of their widespread use, P2P networks for lightweight clients have enormous potential. If we're to leverage the openness afforded by XML, we must give clients in P2P networks the ability to natively generate, parse, process, and transform XML. Then, the data (or codat) they exchange can be specified as XML documents.

Information Appliance Interoperability

What if you want your Java 2 Platform, Micro Edition mobile phone application to dial a phone number stored in your PDA?

If both devices are Bluetooth-enabled, and your mobile phone application uses the Java API for Bluetooth (JSR-000082, http://java.sun.com/aboutJava/communityprocess/jsr/jsr_082_bluetooth.html), we're almost able to dial that number. First, the address book application on your PDA must expose directory services in a standard way so that your mobile phone application can look up contact information. Then, the mobile phone application can request a telephone number, or an address book entry, and the address book application on the PDA can send a reply containing the telephone number.

The messaging format between the two applications needs to be understandable by both. It should also be generic enough so that other applications on the same or different devices could understand it.

If ever there was a cry for XML, this is one of them. The PDA application, as the document provider, needs to generate XML. The mobile phone application, as the document receiver, needs to parse and process XML. Only then can our applications take advantage of all the benefits of XML on a device-to-device level.

Powerful Lightweights

As of the time of writing, powerful lightweights are already available. For example, the Compaq iPaq 3670 comes with 64MB RAM and can be expanded to 128MB with an optional CompactFlash card. Instead of the extra RAM, we could choose an IBM Microdrive and we'll have 1GB of storage. With the Dual-Slot PC Card Expansion Pack, you can plug in two Type-II PCMCIA cards. Fill one of them with the Novatel Wireless Merlin for Richocet and you'd have 128 kbps wireless Internet access. In the other slot you could add yet another 16MB of RAM or perhaps a GPS receiver.

Consider all this, and that the unit has a built-in microphone, speaker, and even a light sensor, and I think you'll agree with me that this device can do more than some desktops.

Yet you don't see many people carrying these high-end configurations around, the most obvious reason for that being their high cost. A device like the one above, so outfitted, at present costs thousands of dollars.

If the past is any indicator, devices like this will come down in price and shrink in size. They may even become "wearable" (Sanyo has already announced a line of raincoats for sale in Japan that have a pocket custom-fit for your Palm). If all this happens we'll wonder how we ever survived without one.

So as prices come down and Internet appliances become more powerful and widespread, some of the burden of XML processing and transformation should be pushed onto the client. Complex XSL transforms and larger DOM trees won't pose problems, and the document provider can remain purely in its XML world. With powerful clients, client-side processing can become a reality.

J2ME

The Java 2 Platform, Micro Edition (or J2ME Platform) is one of the three Java 2 editions published by Sun Microsystems. You are probably more familiar with Standard (J2SE) and Enterprise (J2EE), but Micro is rapidly gaining popularity with vendors and developers.

Just as J2SE and J2EE comprise a set of tools and libraries, so too does J2ME. However, J2ME's tools and libraries are targeted at a different set of devices than its larger siblings. Everything from smart cards and pagers to mobile phones, PDAs, and set-top television boxes are potential targets. These information appliances span a wide range of functionality, features, and capabilities.

To address this range of functions, J2ME has a layered architecture that allows vendors and developers to maximize the capabilities of the target device while still retaining some interoperability and platform independence. Consider the differences between a set-top television box and one-line pager:

Set-Top Box

Pager

Virtually unlimited power supply

Battery powered

Lots of memory

Very little memory

Relatively speedy performance

Slow performance

Big monitor for user interface (TV)

Single line of LCD

Always - on high bandwidth network

Low bandwidth network

Persistent storage

Little persistent storage

We wouldn't want the libraries on the one-line pager to have to include the class javax.swing.tree.DefaultMutableTreeNode, but maybe we would want it on the set-top box. To maximize the capabilities of both devices while still using the same Java 2 edition, the following layered architecture was designed:

  • Operating System - the unit's OS, for example Palm OS, Symbian/EPOC,
    or Windows Pocket PC
  • JVM layer: the Java virtual machine and its services (such as the garbage collector) compiled for this OS. This layer is an implementation and compilation of the Java Virtual Machine for a particular operating system. It translates bytecode into native operating system calls. Sun has made available the Kilobyte Virtual Machine (page 572). We will be using the KVM in the sample application later in this chapter, but here is a list of some VMs currently available for information appliances
  •   IBM J9 (http://www.embedded.oti.com/)
  •   Kada Mobile http://www.kadasystems.com/)
  •   Esmertec Jbed (http://www.esmertec.com/)
  •   SuperWaba (http://www.superwaba.org/)
  •   Waba (http://www.wabasoft.com/)
  •   Symbian (http://www.symbian.com/) includes both an operating system and VM
  •   Sun KVM (http://java.sun.com/products/cldc/)

Some of these VMs may not fit into the J2ME-specified framework.

  • Configuration layer: base classes that must be available to any profile (and application) compliant with a particular configuration. For example, I/O classes, data types and structures, and network connections are typically specified in this layer
  • Device profile layer: classes built upon ones specified in the configuration layer. GUI classes are specified here. Choosing a profile dictates a configuration, as profiles are typically only available for one configuration. Applications written for a particular profile are guaranteed to work on any device that implements that profile
  • Application layer: this is your application code. Before writing an application, you must decide upon what profile it will depend. Choosing a profile is an important step because it breaks interoperability; your application is not guaranteed to work on another profile. Just like any Java application, J2ME applications define their own classes and have main() entry points. Applications make use of classes published by the profile and configuration layers

It's worth mentioning that there are a number of products that allow applications written in Java to run on information appliance operating systems without virtual machines. They work by compiling Java bytecode into native operating system-dependent machine code. Some of these products are compilers that are intended to compile during an off-line build process, before application deployment. Others are complete VMs containing a compiler that compile during or just before runtime (similar to just-in-time and HotSpot).

Configurations

A configuration is a device class. If we remember our set-top box and pager comparison (see J2ME, page 568), these are clearly two different classes of devices. As such, they represent very different target platforms. The J2ME engineers knew the Write-Once, Run-Anywhere model that made J2SE and J2EE so successful wouldn't work for all classes of information appliances, so they made a decision to break application interoperability when they defined configurations.

A configuration is a class of information appliance.

There are currently two configurations within J2ME: "connected devices" (like set-top boxes) and "limited connected devices" (like PDAs or pagers). Each focus has different qualities, so they have their own definition of classes, which must be available to the VM.

These two focuses, or configurations, are named:

Connected Device Configuration (CDC) http://java.sun.com/aboutJava/communityprocess/jsr/jsr_036_j2mecd.html

Connected Limited Device Configuration (CLDC) http://java.sun.com/aboutJava/communityprocess/jsr/jsr_030_j2melc.html

Sun provides reference implementations of these two configurations, but other vendors have also implemented them along with their own VMs.

In this chapter, we will concern ourselves primarily with the CLDC.

CLDC

As stated before, each configuration defines and implements a set of classes that must be available to the VM. The CLDC both inherits classes from the J2SE and defines its own classes. Here is a partial list of those classes inherited from the J2SE, many of which are subsets of their original, reducing them only to their essentials:

java.lang.Object, java.lang.Runtime, java.lang.System, java.lang.Throwable, java.lang.Exception, java.lang.RuntimeException(and all its subclasses) java.lang.Boolean, java.lang.Byte, java.lang.Character, java.lang.Integer, java.lang.Short, java.lang.Void, java.lang.String, java.lang.StringBuffer, java.lang.Math, java.util.BitSet, java.util.Dictionary, java.util.Enumeration, java.util.Hashtable, java.util.Vector

Sun provides a reference implementation of the CLDC, which includes the Java KVM. Other vendors are working on CLDC implementations that hold great promise, such as IBM's VisualAge Micro Edition with the J9 virtual machine.

Profiles

Profiles specify another set of Java classes that must exist along with the classes made available by the configuration to which the profile belongs. They are targeted at specific industry segments. Together, a profile and its configuration specify a full set of Java classes for a particular type of device. Applications written for a particular profile are guaranteed to work on any device that implements that profile.

An important point to note is that user interface components are specified in profiles, not configurations. This is because the configuration encompasses too broad a spectrum. For instance, a PDA and a pager are both CLDC devices, but of what need is the CheckBox class on a one-line pager?

User Interface components are specified in profiles, whereas I/O, network connectivity, and data types are defined in configurations.

There are hundreds of proposed profiles. Here are four examples. Note that a profile defines the configuration on which it depends.

  • PDA Profile for CLDC - defines user interface and data storage APIs for small, resource-limited handheld devices (this profile is not currently available). http://jcp.org/jsr/detail/075.jsp 
  • Mobile Information Device Profile for CLDC - defines user interface, data storage, messaging, networking, security, and wireless telephony for mobile devices. http://jcp.org/jsr/detail/037.jsp 
  • Foundation Profile for CDC - defines a base profile for devices that have 1MB ROM, 512KB RAM, rich network connectivity, but no user interface. User interfaces can be layered on top of this profile by defining another profile. http://jcp.org/jsr/detail/046.jsp 
  • Personal Profile for CDC - this profile/configuration combination actually defines the successor to PersonalJava (http://java.sun.com/products/personaljava/). Applications written to versions 1.1.x and 1.2 of the PersonalJava API specification will work with this combination. It appears that PersonalJava will be absorbed into the J2ME framework under this profile/configuration.

Java KVM

Part of Sun's reference implementation of the CLDC, the Java Kilobyte Virtual Machine (KVM) was designed to operate with as little as 160 to 512KB of total memory on 25MHz 16 or 32 bit RISC or CISC processors. It was written in C and is freely available for four operating systems:

  • Win32 (Windows 2000/NT/ME/98)
  • Solaris
  • Linux
  • Palm OS

It's interesting to note that Java Native Interface (JNI) calls cannot be made from Java code unless the native call was linked into the KVM when it was compiled. The KVM source code is provided so you can do this. The reason JNI calls cannot be made without this step is so that the size of the KVM is kept to a minimum. Opening up native access to a device allows for the execution of potentially hostile code unless security mechanisms are put into place, such as exist in the Java 2 Standard Edition. This would increase the footprint and complexity of the KVM. Its implementers apparently weighed the tradeoffs and chose size over secure JNI capabilities. Other VMs, such as the Kada VM (http://www.kadasystems.com/kada_vm.html), do allow JNI calls.

We'll be focusing on the Palm OS build of the KVM for this chapter since it's a good demonstration of the usability of the VM in a constrained device.

Package com.sun.kjava: A KVM User Interface

Remember that in the J2ME architecture, user interface and device-specific network classes are defined at the profile level. However, the KVM for the Palm was released at Sun's JavaOne conference in 1999 - before the CLDC was released and a PDA profile was defined. Actually, to date, there still isn't an implementation of the PDA Profile for the CLDC, just a Java Specification Request.

As the 1999 KVM release for the Palm wouldn't have been very interesting without including user interface classes, the package com.sun.kjava.* was born. It includes:

  • A set of simple user interface classes, such as com.sun.kjava.Button
  • An event callback mechanism so events like penUp, beamReceive, and keyDown can be handled by application code
  • PDB (Palm database) access classes for file input/output
  • Data structure classes, such as com.sun.kjava.List and com.sun.kjava.IntVector

The package com.sun.kjava is unsupported and quite limited. Additionally, with the newer J2ME architecture, it doesn't belong as part of the CLDC distribution (it is currently provided as an "overlay" to the CLDC files - see Setting Up the Environment, CLDC/Java KVM, page 620). It is generally accepted that the package will be renamed (for example, com.palm.kjava), rolled into the PDA profile, or incorporated into a Palm OS-specific profile.

kAWT

We won't be using kAWT (http://www.trantor.de/kawt/) in this chapter, but it's worth mentioning briefly. It is a simplified, lightweight version of the AWT API for the KVM. If you are going to do any serious GUI development with the KVM, this is the best package currently available. It also optionally includes some very useful I/O and networking classes.

One benefit of using the kAWT is that applications developed with it will run under the AWT with J2SE (although the converse isn't true). Currently, there are ports for Palm OS, IBM's J9, Blackberry RIM, and the MID Profile (See Profiles, page 571). The disadvantages to using the kAWT are:

  • Higher storage capacity is required (the Palm OS port is a 178KB PQA, including UI, I/O, and all networking classes)
  • Loading kAWT applications takes longer
  • It's non-standard (although there currently is no standard CLDC GUI implementation for PDAs)

Parsers

In this section, we will concentrate on two XML parsers for lightweight clients:

In addition to these, you might also want to check out kXML (http://kxml.enhydra.org/), TinyXML (http://www.gibaradunn.srac.org/tiny/index.shtml), and XPP (http://www.extreme.indiana.edu/soap/xpp/). They won't be used in this chapter, but we'll talk about a few of their features briefly in Push, Pull, and Object Model Parsing, below.

Each of these five parsers has advantages and disadvantages with varying support for W3C recommendations and standards. We will examine some of these issues in this section. For the two parsers we review in detail, there is a table of features included in corresponding sections.

We'll also discuss the three different types of parsers: push, pull, and object model, and the advantages and disadvantages of each in regard to lightweight clients.

After we've reviewed XML parsers for lightweight clients, we'll use some of the technologies discussed in the J2ME section (page 568), along with an XML parser, to create a peer-to-peer sample address book application.

Push, Pull and Object Model Parsing

There are currently three types of XML parser:

  • Push parsers
  • Object model parsers
  • Pull parsers

Although push and object model parsers are the most popular and well known, they are not always the best type of parser for lightweight clients. We'll discuss this further in the next section. This chart outlines lightweight XML parsers and the models they implement. Note that some parsers give the option of parsing documents using different models. For comparison, a heavyweight parser, Xerces-J, has been included:

Parser

Type

Description

URL

NanoXML

Push and Object Model

Versions 1.x of this lightweight DOM-style parser offer optional SAX 1.0 support.

http://nanoxml.sourceforge.net/ 

MinML

Push

An incredibly small parser offering SAX 1.0 support.

http://www.wilson.co.uk/xml/minml.htm 

TinyXML

Push and Object Model

Very small parser that offers both DOM- and SAX-style interfaces. No support for generating documents, just reading them.

http://www.gibaradunn.srac.org/tiny/index.shtml 

(CLDC/KVM port for the Palm OS available at http://www.microjava.com/news/techtalk/tinyxml/ 

XMLtp

Push

Offers a DOM-style tree interface. For non‑lightweight clients, it has the optional feature of an element-style class that implements javax.swing.tree.MutableTreeNode and javax.swing.tree.TreeNode. This enables elements to be visualized directly by a javax.swing.JTree.

http://mitglied.tripod.de/xmltp/intro.html

XParse-J

Object Model

Tiny parser that "aspires to be the smallest Java XML parser on the planet", XParse-J offers custom DOM-style parsing interface. Also a JavaScript version.

http://www.webreference.com/xml/tools/xparse-j.html 

kXML

Pull

Works "out-of-the-box" with J2ME. Includes an XML writer and WAP Binary XML support (WBXML), a binary encoding optimized for the mobile phone Wireless Application Protocol standard.

http://www.kxml.org/ 

XPP

Pull

XPP is small (21KB JAR) and fast and has both Java and C++ implementations. Supports namespaces and mixed content. Uses very little memory during parsing.

http://www.extreme.indiana.edu/soap/xpp/ 

KVMJab XMLParser

Push

Works "out of the box" with J2ME and the Java KVM. Only 5629 bytes, quite limited.

http://www.alsutton.com/xmlparser/index.html 

Xerces-J

Push, Object Model, and no pull but lazy parsing comes close.

A classic heavyweight XML parser intended for servers and desktops.

http://xml.apache.org/xerces-j/index.html 

Now let's discuss the three XML parser models in more depth.

Push Parsers

Push parsers are the class of XML parsers that publish a set of interfaces, implemented by applications, through which the parser relays document information.

SAX is the most well known XML push parser. After your application tells the SAX parser to begin parsing, the parser calls back (or pushes) into the application code to notify the application of parse events. This model forces application code to maintain state within the callback class(es), and to evaluate that state at each event. That means many class variables in the callback class(es), as well as (possibly) getters and setters for those variables. This isn't very developer-friendly as it creates a lot of extra work.

Additionally, SAX and most push parsers parse an entire document at once. As soon as your code tells SAX to begin parsing a document, the document is parsed in its entirety. For very large documents, this means lots of state information must be maintained - causing a potentially large memory footprint, not to mention all the wasted processing and battery power that goes into parsing an entire document if not all of it is needed (though not nearly as much as a DOM parser would require).

We'll examine push parser issues in more detail in the NanoXML section (page 577).

Object Model Parsers

Object model parsers are that class of XML parsers that build in-memory representations of XML documents using tree-like data structures. The most popular are parsers conforming to DOM Level 1 and Level 2 specifications, but others exist (for example, NanoXML).

Object model parsers, unlike push parsers, don't usually require the developer to maintain document state during parsing, but they have their own drawbacks on lightweight clients. Most lightweight object model XML parsers keep an entire parsed document in memory all the time, until the parser and its resources are garbage collected. Parsing a large document with this kind of parser, even if only one node from the whole document is required, always means occupying large chunks of memory. This approach isn't desirable on lightweight clients since their memory is constrained.

Also, as with push parsers, all the object model parsers for lightweights that I know about parse an entire document at once. As soon as your code tells the parser to begin parsing, the entire document is parsed so an in-memory object model can be built. As with push parsers, this wastes a lot of processor and battery power if the entire document is not required.

Lazy Parsing

Some heavyweight object model parsers offer lazy parsing, for example Xerces-J. Parsing lazily means that the object model is built and stored in memory only as the calling application requests a node. However, usually the entire ancestor-or-self axis (with respect to the requested node) is stored in memory after the request. Certainly, the entire ancestor-or-self axis must be parsed when a node is requested. This isn't optimal for constrained devices, but it's better than the "parse-and-store-it-all-at-once" approach taken by present-day lightweight object model XML parsers.

At the time of writing, no object model XML parsers are available for lightweights that use lazy parsing. Hopefully, this will change in the near future.

Pull Parsers

A newer player in the world of XML parsing, pull parsers aren't nearly as prevalent as SAX and DOM parsers. kXML and XPP appear to be the only feasible contenders today.

Pull parsers are particularly useful for lightweight clients because they parse only the minimal chunk of a document necessary when an application requests the next piece of data. The application can process this data at its leisure and then ask for the next piece, spurring the parser to parse just another small chunk of the document. This is similar to the workings of a java.io.Reader. The benefits of this approach are:

  • Processing and battery power are used when and only when the application needs the next piece of data; the application maintains control over its parsing needs
  • Memory footprint is reduced; the parser only needs to maintain minimal state information and a pointer to the current element (although the document itself must remain in memory for as long as parsing might continue, so it's to the application's advantage to acquire what it needs quickly). An entire object model does not remain memory-resident.

Unfortunately, there is no standard interface yet for XML pull parsing, like SAX for XML push parsing or DOM for XML object model parsing. Therefore, although pull parsers sound great for lightweight clients, they may be too immature for use today in production applications. Applications would be stuck with the limitations of the parser selected by the development team without a clear upgrade path. It may be difficult or impossible (without rewriting the application) to change parsers in the future.

Let's briefly look at some code that demonstrates how pull parsers work. This code is based on sample kXML code from http://www.microjava.com/news/techtalk/kxml/. It outputs element names and document text. Note the use of recursion, something atypical in applications using push and object model XML parsers.

  public void traverse(Parser parser) throws Exception {

    boolean end = false; 

    while (!end) {

      //request next document event

      Event event = parser.read();

      switch (event.getType()) {

        case START_TAG:

          System.out.println("start: " + event.getName());

          traverse(parser); //recursive call

          break;         

        case END_TAG:

          System.out.println("end: " + event.getName());

          leave = true;

          break;

        case END_DOCUMENT:

          leave = true;

          break;         

        case TEXT:

          System.out.println("text: " + event.getText());

          break;

      }

    }

NanoXML

A NanoXML document is a tree of nanoxml.XMLElement objects. These correspond to the org.w3c.dom.Node interface in the DOM specification.

NanoXML does not implement the DOM interfaces. You build and retrieve document contents through a proprietary API, but an optional SAX 1.0 component exists for document retrieval. This API is covered in this chapter.

Originally written in April 2000, NanoXML has gone through a few iterations. The current release is 1.6.8. The next major release of NanoXML will be 2.0, and it is scheduled to be available in July 2001. The current beta release is promising, although it seems to have lost compatibility with 1.x releases. We will discuss both releases since they differ significantly in library size and features.

The web site for NanoXML is http://nanoxml.sourceforge.net/.

The source code is available under an open source license. The site is maintained by Marc De Scheemaecker, who is the author of the package. I have found him to be very responsive to support questions.

If your target platform is the Java KVM, the latest NanoXML won't do the job because it has dependencies on classes that are not included in the standard Java KVM. You'll have to get version 1.6.4 from the NanoXML web site and the kXMLElement class from http://www.ericgiguere com/microjava /cldc_xml.html.

Without a doubt, the greatest feature of NanoXML is also its smallest - its JAR file size. Its JAR file size is second only to MinML, but the size depends upon which version you use and whether or not you choose the optional SAX component. You can get away with XML parsing in as little as 6047 bytes!

Unfortunately, NanoXML suffers from some performance issues and memory usage problems, which we will discuss.

Current Release - Version 1.6.8

What's Supported, What's Not Supported, and What's Optional

Feature

Supported

Notes

Document validation

No

-

Well-formed XML only

Yes

nanoxml.XMLParseException thrown if malformed

Mixed content

No

Creates bugs in the internal document tree!

Entity expansion

Yes

Entities are specified in the XMLElement constructor as a hashtable of key-value pairs

SAX

Yes, SAX 1.0

-

DOM

No

-

Comments

Ignored

-

Processing Instructions

No

PI in the preamble <?xml version="1.0" encoding="UTF-8"?> is ignored; subsequent PIs throw nanoxml.XMLParseException

Namespaces

Indirectly

Prefixes aren't distinguished from local parts - <prefix:name> becomes an atomic element or attribute

JAR size

-

6047 bytes; 8618 with SAX support

Version 1.6.8 is a non-validating parser. Any reference to a DTD or XML Schema is ignored, although there is support for entity expansion.

Mixed content isn't supported, for example:

<Request>ItemDetail

  <ItemId>553</ItemId>

</Request>

will result in an incorrect internal document representation. XML namespaces aren't supported directly, although they won't cause any parsing difficulties. This SOAP envelope, for example, is parsed without problems:

<SOAP:Envelope xmlns:SOAP='http://schemas.xmlsoap.org/soap/envelope/'

  xmlns:xsi='http://www.w3.org/1999/XMLSchema-instance'

  xmlns:xsd='http://www.w3.org/1999/XMLSchema'

  xmlns:SOAP-ENC='http://schemas.xmlsoap.org/soap/encoding/'

  SOAP:encodingStyle='http://schemas.xmlsoap.org/soap/encoding/'>

</SOAP:Envelope>

The element <Envelope> is stored literally as <SOAP:Envelope> with no comprehension of the SOAP namespace prefix. It also contains five attributes: xmlns:SOAP, xmlns:xsi, xmlns:xsd, xmlns:SOAP-ENC, and SOAP:encodingStyle. Since document validation isn't supported, namespace URLs are not followed.

Comments are skipped by the parser and not stored internally. The first processing instruction in an XML document:

<?xml version="1.0" encoding="UTF-8"?>

is skipped and also not stored internally. Any subsequent processing instructions will throw a nanoxml.XMLParseException.

A SAX-compatible API can optionally be used with parsing (see Package nanoxml, page 579). If the SAX API is not used, retrieval of elements and attributes is through a completely proprietary API (see public class XMLElement, page 580).

Documents can also be built from scratch and written to any Writer object, or they can be modified using the addChild() and removeChild() methods.

The JAR file size of this release, excluding the optional SAX component, is 6047 bytes. Adding SAX functionality brings the library up to 8618 bytes. But this small size doesn't come without a price. As with most parsers reviewed in this chapter, NanoXML is not XML 1.0 compliant.

You have two choices for parsing: a DOM-style or SAX 1.0 interface. Both choices are multiple-pass parsers, iterating over the same document more than once in order to build an internal representation (this is true even of the SAX interface because it is built on top of the DOM-style interface). This negatively affects performance. Finally, even if the SAX parser is used, an entire document tree is built and kept in memory until the parser object is garbage collected. Not only does this lead to a large memory footprint when parsing large documents, but depending upon the garbage collection mechanism used by your VM, it may severely fragment the heap and prevent subsequent object creation. We discuss this issue in the Java KVM section (page 572).

Parsing large documents with this version of NanoXML may be inappropriate for lightweight clients. However, for relatively small documents, it could be just the thing.

Package nanoxml

This is the only package in the NanoXML library. This version has only two classes: XMLElement and XMLParseException. XMLElement represents an XML document and its content. XMLParseException is the exception that is thrown when a parse error occurs; for example, when a document that is not well-formed is encountered. Let's look at each of these classes in detail.

Class XMLElement

nanoxml

public class XMLElement

XMLElement is a representation of an XML document. In addition to being able to parse XML documents, this serializable class contains all the methods needed to get/set elements, subelements, attributes, and text in a document. It derives from java.lang.Object.

Constructors

public XMLElement()

public XMLElement(Properties conversionTable)

public XMLElement(boolean skipLeadingWhitespace)

public XMLElement(Properties conversionTable,

                  boolean skipLeadingWhitespace)

public XMLElement(Properties conversionTable,

                  boolean skipLeadingWhitespace,

                  boolean ignoreCase)

Arguments

The XMLElement constructor can only take a few different arguments. Each constructor makes use of a conversion table. A conversion table is simply a Properties object, which is used to mapentities to their conversion values. When an entity is found in a parsed document, it is used as a key into the Properties object to find the replacement value.

The default constructor creates a new XMLElement object with a Properties map thatconverts the following predefined entities:

Entity

Maps To:

&amp;

&

&lt;

<

&gt;

>

&apos;

'

&quot;

"

Here is a summary of the constructor arguments:

Arguments

Type

Effect

conversionTable

java.util.
Properties

Entities are keys into the Properties map that provide a replacement value when the key is found in during parsing

SkipLeadingWhitespace

boolean

Directs the parser to ignore leading whitespace in #PCDATA

ignoreCase

boolean

Directs the parser to ignore element and attribute case. Useful for HTML parsing.

If a conversion table is specified, the base entities specified above (&amp; &lt; &gt; &apos; &quot;) are used in addition to that table. The default values for the other two arguments are false for skipLeadingWhitespace (whitespace won't be skipped), and notice that the default argument is true for ignoreCase.

Parse Methods

The parse() methods direct an XMLElement object to begin parsing an XML document.

public void parseString(String string)

       throws XMLParseException

public int parseString(String string, int offset)

       throws XMLParseException

public int parseString(String string, int offset, int end)

       throws XMLParseException

public int parseString(String string, int offset,

                       int end, int startingLineNr)

       throws XMLParseException

public void parseFromReader(java.io.Reader reader) 

       throws IOException, XMLParseException

public void parseFromReader(java.io.Reader reader, int startingLineNr)

       throws IOException, XMLParseException

public int parseCharArray (char[]chrAry, int offset, int end)

       throws XMLParseException

public int parseCharArray(char[] chrAry, int offset, int end,

                              int startingLineNr)

       throws XMLParseException

The parseString() and parseFromReader() methods actually just resolve to one of the parseFromCharArray() method calls.

Arguments

Arguments

Type

Effect

string

String

XML source content is contained within a string

reader

java.io.
Reader

XML source content is contained within a java.io.Reader

chrAry

char[]

XML source content is contained within a character array

offset

int

Marks where parsing should begin, counting from the first character

End

int

Marks where parsing should end

startingLineNr

int

Marks from which line the parser should begin parsing

Usage and Examples

To direct NanoXML to parse an XML file, we would write code like this:

BufferedReader br = null;

try {

  br = new BufferedReader(new

  FileReader("request.xml"));

}

catch (java.io.FileNotFoundException e) {

e.printStackTrace();

  System.exit(0);

}

XMLElement elem = new XMLElement();

try {

  elem.parseFromReader(br);

}

catch (java.io.IOException e) { 

  e.printStackTrace();

}

Children Methods

These methods enable access to child elements. They are typically used after parsing a document. Remember that all elements in the document are represented as XMLElement objects.

There is no way of accessing "sibling" elements in NanoXML as there is in DOM; each element is a child of the element directly above it.

public int countChildren()

public java.util.Enumeration enumerateChildren()

These methods take no arguments. Interestingly enough, countChildren() only returns the number of children one level deep in the tree. The other method returns the children n-levels deep.

Child Methods

These methods perform operations on elements in the document. It is helpful to think of the structure in memory as a Document Object Model (DOM), even though it is not. Remember that all elements in the document are represented as XMLElement objects; this is similar to the Nodeinterface in the package org.w3c.Dom. Again, there are no siblings in NanoXML - only children.

public void addChild(nanoxml.XMLElement child)

public void removeChild(nanoxml.XMLElement child)

public void removeChild(String key)

public void setTagName(String tagName)

public String getTagName()

public void setContent(String content)

public String getContents()

public String toString()

public void write(java.io.Writer writer)

public void write(java.io.Writer writer, int indent)

Some of the quirks of this version of NanoXML are apparent here: we have a method called setContent(), but its accessor is called getContents().This is resolved in NanoXML 2.0 beta. setTagName() and getTagName() set the element name. setContent() and getContents() set the #PCDATA of the node. The write() methods output the node and its subnodes to a java.io.Writer.

Arguments

Arguments

Type

Effect

child

nanoxml.
XMLElement

The element to be added or removed to or from the document

key

String

The name of the attribute to be removed from the XMLElement

tagName

String

The textual name given to the element

content

String

The #PCDATA content of the element

writer

java.io.
Writer

The writer to which the element (or document) should be written

indent

int

The number of spaces to indent for each new nested element

Usage and Examples

Here is an example of how an XML document can be generated and written to a java.io.Writer.

Example: Generating and Writing an XML Document

The XML we generate is a request from our auction site, for the detail information on item numbers 553 and 554.

Sample Request.xml

<?xml version="1.0" encoding="UTF-8"?>

<Request name="ItemDetail">

  <Parameters/>

    <ItemId type="Integer">553</ItemId>

    <ItemId type="Integer">554</ItemId>

</Request>

NanoXML gives us no way to add processing instructions to a document, so we will output the prolog <?xml version="1.0" encoding="UTF-8"?> ourselves. The addProperty() methods haven't been introduced yet, but they will be in the next section.

Source Request.java

import nanoxml.XMLElement;

import java.io.*;

public class Request  {

  public static void main(String args[]) throws IOException {

    //create the document and set the root

    XMLElement root = new XMLElement();

    root.setTagName("Request");

root.addProperty("name", "ItemDetail");

    //create and set the first child

    XMLElement child1 = new XMLElement();

    child1.setTagName("Parameters");

    root.addChild(child1);

    //create and set next child

    XMLElement child2 = new XMLElement();

    child2.setTagName("ItemId");

    child2.setContent("553");

    child2.addProperty("type", "Integer");

    root.addChild(child2);

    //create and set the last child

    XMLElement child3 = new XMLElement();

    child3.setTagName("ItemId");

    child3.setContent("554");

    child3.addProperty("type", "Integer");

    root.addChild(child3);

    //create a writer and output the prolog

    BufferedWriter bw = new BufferedWriter(new

      FileWriter("Request.xml"));

    bw.write(new String("<?xml version=\"1.0\" 

      encoding=\"UTF-8\"?>"), 0, 38);

    //output the document and close the writer

    root.write(bw);

    bw.close();

  }

}

Compiling and Running

To compile Request.java, do the following from the directory in which Request.java is saved:

> javac -classpath \NanoXML\nanoxml.jar;. Request.java

To run Request.class, do the following:

> java -classpath \NanoXML\nano.xml.jar Request

The output is a file in the current directory called Request.xml. It should look like Request.xml in the beginning of this example.

Attribute Methods

These methods allow you to get and set attributes on an XMLElement object. Remember that NanoXML stores attributes for each element as a java.util.Properties object.

public void addProperty(String key, Object value)

public void addProperty(String key, int value)

public void addProperty(String key, double value)

public Enumeration enumeratePropertyNames()

public String getProperty(String key)

public String getProperty(String key, String default)

public int getProperty(String key, int default)

public double getProperty(String key, double default)

public boolean getProperty(String key, String

                           trueVal, String falseVal, boolean default)

public Object getProperty(String key, java.util.Hashtable 

                          valueSet, String defaultValue)

Once again, we see some of the quirks of this release of NanoXML: the setter for getProperty() is called addProperty() instead of setProperty(), this is fixed for NanoXML 2.0 Beta.

The first three methods allow you to add or set different types of properties to an element. enumeratePropertyNames() allows you to enumerate through the set of attributes for an element, while getProperty() returns specific values for known named attributes. The getProperty(String key) method returns null if the property doesn't exist for the element.  

Arguments

Arguments

Type

Effect

key

String

The name of the attribute to lookup

value

int, double, String

The value of the attribute

default

int, double, String

The value that is returned if the attribute doesn't exist

     

trueVal

String

The value of the attribute which should be interpreted as representing true; for example, yes, true, or 1. This argument gives you the flexibility of specifying what value to use for Boolean true in the getProperty() methods that returns a boolean. See Usage and Examples, below.

Table continued on following page

Arguments

Type

Effect

falseVal

String

The value of the attribute that should be interpreted as representing false; for instance, no, false, or 0. This argument gives you the flexibility of specifying what value to use for Boolean false in the getProperty() method that returns a boolean. See Usage and Examples, below.

valueSet

Java.util.
Hashtable

Stores the attributes of the element as key value pairs.

Usage and Examples

Let's first take a look at how to use this somewhat non-intuitive method, which was removed from Nano XML 2.0:

public boolean getProperty(String key, String trueVal,

                           String falseVal, boolean default)

This helper method makes it easier to determine values for Boolean attributes. A Boolean attribute is an attribute with only a Boolean value. Take, for instance, the <exec> element in Jakarta's Ant project:

<exec executable="runme.exe" dir="." failonerror="true"/>

The failonerror attribute is a Boolean attribute whose values are true and false. ColdFusion's <cfoutput> element also has a Boolean attribute:

<cfoutput query="MyQuery" group="id"

  groupcasesensitive="no"/>

However, its values are yes and no. getProperty() allows us to test the value of Boolean attributes generically. For example, to retrieve the value of groupcasesensitive, whose default value is yes if not specified, from <cfoutput>, we would write:

boolean b = elem.getProperty("groupcasesensitive", "yes",

                             "no", false);

If groupcasesensitive isn't specified, the last parameter, false in this case, marks its default value. Another example would be if we were parsing Ant's <exec> element. We could write:

boolean b = elem.getProperty("failonerror", "true",

                             "false", false);

Note, however, that we could also write:

boolean b = (elem.getProperty("failonerror",

    "false")).equalsIgnoreCase("false");

Now let's move to look at our previous XML document, request.xml. Here is an example that reads it and outputs the #PCDATA and type attribute value for each <ItemId> element. Recall the XML document looks like this:

<Request name="ItemDetail">

  <Parameters>

    <ItemId type="Integer">553</ItemId>

    <ItemId type="Integer">554</ItemId>

  </Parameters>

</Request>

First, we must read and parse the file:

BufferedReader br =

  new BufferedReader(new FileReader("request.xml"));

XMLElement elem = new XMLElement();

elem.parseFromReader(br); //elem is the root node

Then, we enumerate through each child node of the root. If any child element is named <ItemId>, we output its type attribute and its #PCDATA content. If the type attribute doesn't exist for some reason, the default value unknown is used.

Enumeration e = elem.enumerateChildren();

while (e.hasMoreElements()) {

  XMLElement child = (XMLElement)e.nextElement();

  //is the child named ItemId?

  if (child.getTagName().equals("ItemId"))  {

    System.out.print("Type = " +      child.getProperty("type", "unknown"));

    System.out.println(" and item id = " +

      child.getContents());

  }

}

Running our application against request.xml:

> java -classpath /usr/local/java/NanoXML/nanoxml.jar Request

Type = Integer and item id = 553

Type = Integer and item id = 554

Class XMLParseException

nanoxml

public class XMLParseException

       extends RuntimeException

This class is usually thrown when a non-well-formed document is parsed or a processing instruction that isn't in the preamble is encountered.

This class represents a NanoXML parsing exception. It extends java.lang.RuntimeException. Even though processing instructions that aren't in a document's preamble can certainly be part of a well-formed XML document, NanoXML doesn't like it and will throw an exception.

Package nanoxml.sax

Adding the optional SAX 1.0 parser to NanoXML increases the library's size by another 2,571 bytes (for a total of 8,618 bytes). This is quite small, but it also increases your dependencies. For example, the package makes use of java.net.URL, java.io.InputStream, and java.util.Locale among others. Depending upon your particular virtual machine and device profile, some or all of these classes may not be available. You might be able to get creative and rewrite some of the package if you want to reduce its dependencies as was done for the Java KVM.

In addition to possibly not having all required classes, SAX is a push parser. After telling the parser to begin, the parser calls back (or pushes) into your application code to notify you of parse events. This model forces your code to maintain state within the callback class(es), and to evaluate that state at
each event.

One of the nice things we've seen in the previous code examples is that there was no need for state information. This is much more programmer-friendly than the code we are about to see.

In SAX's defense, however, it will allow you to plug another parser underneath the hood without any code changes on your part. It's a standardized API. All you need to do is use different class files or a different JAR. If you're seeing performance or memory usage problems with NanoXML, this will allow you to plug another parser into your application without much work. However, you might be better off ignoring the SAX standard and using the pull model of parsing, such as that used by kXML and XPP.

Unfortunately, a standard pull API for XML parsing has yet to be decided upon, so if you choose a pull parser, your upgrade path is unclear.

Class SAXParser

nanoxml.sax

public class SAXParser

       implements org.xml.sax.Parser

This class implements the org.xml.sax.Parser interface published by David Megginson. It's built on top of the class XMLElement so it has all the features (or lack thereof) outlined in the Features table on page 578. Here is a list of other features applicable to this particular parser:

Feature

Support for org.xml.sax.Parser

Notes

Locales

English language only

SAXException thrown if another type of
local is set with setLocale()

Whitespace

ignorableWhiteSpace()
is never called

Leading whitespace in #PCDATA skipped

DTD validation

None

The objects implementing interface org.xml.sax.DTDHandler and interface org.xml.sax.EntityResolver in your application are never called back

Mixed content

None

XML such as <Request>widgets<Item>553</Item></Request> isn't permitted

Document locator

Support for line numbers and system identifiers

org.xml.sax.Locator.getLineNumber() and org.xml.sax.Locator.getSystemId() are supported

Processing instructions

processingInstruction
() is never called

Additionally, this parser only supports locales using the English language. It will throw a SAXException if another type of locale is set using the setLocale() method. Attribute data types are always reported as CDATA.

Since SAXParser makes use of the nanoxml.XMLElement class internally, it has to choose one of the XMLElement() constructors to use. These constructors dictate certain parsing behaviors (see the section public class XMLElement, page 580). The default parsing behavior is case insensitivity to element and attribute names, to skip leading whitespace in PCDATA elements, and to expand only the entities &amp;, &lt;, &go;, &apos;, and &quot;. However, this behavior can be overridden by deriving your own class from SAXParser and implementing its createTopElement() protected method to call a different XMLElement() constructor.

Error handlers and document locators are supported, as well as parsing from a URI.

Usage and Examples

Let's look at what it would take to implement one of our previous examples using the SAX interface. This will give you a good idea about what I mean by having to maintain state in your application for push parsers like SAX.

This example reads the XML document from the section Attribute Methods (page 597), request.xml, and outputs the #PCDATA and type attribute value for each ItemId element. Recall the XML document looks like this:

<Request name="ItemDetail">

  <Parameters>

    <ItemId type="Integer">553</ItemId>

    <ItemId type="Integer">554</ItemId>

  </Parameters>

</Request>

Remember, we want the same output that the previous code (which used NanoXML) produced. To refresh your memory, the output was:

Type = Integer and item id = 553

Type = Integer and item id = 554

Here is the code that uses SAX. You'll have to put David Megginson's sax.jar for SAX 1.0 (http://www.megginson.com/SAX/SAX1/index.html) in your CLASSPATH, as well as nanoxml-sax.jar.

import nanoxml.sax.SAXParser;

import org.xml.sax.*;

public class RequestHandler extends HandlerBase {

  private String _type;

  public RequestHandler () throws Exception {

    SAXParser parser = new SAXParser();

    parser.setDocumentHandler(this);

    parser.setErrorHandler(this);

    parser.parse("request.xml");

  }

    

  public void startElement(String name,

       AttributeList attrs) throws SAXException {

    if (name.equals("ItemId")) {

      if (attrs.getValue("TYPE") == null)

        _type = "???";

      else

        _type = attrs.getValue("TYPE");

    }

  }

  public void characters(char ch[], int start,

      int length) throws SAXException {

    System.out.print("Type = " +_type + " and item id = ");

    System.out.println(ch);

  }

  public static void main(String args[]) throws Exception {

    RequestHandler t = new RequestHandler ();

  }

}

Notice the private member variable type that saves the value of the type attribute for the element currently being parsed. There is no other way to implement this in SAX. This is a small example, too. For more complex parsing, the amount of state needing to be saved increases.

This code is also quite a bit larger than the code that used nanoxml.XMLElement. SAX just isn't as programmer-friendly.

NanoXML Version 2.0 Beta

The second major release of NanoXML isn't due to be released until July 2001. A beta release is available now, however. It lacks SAX 1.0 support and there is still no direct support for XML namespaces, but the author assures me that a 2.1 release will support both SAX 2.0 and namespaces. NanoXML 2.0 is quite different from 1.x, so the 1.x port for the Java KVM won't work with 2.0 yet. Hopefully, a KVM port will be made available.

The beta release increases the JAR size from version 1.6.8 from 6,047 bytes to over 20,000 bytes, a significant increase. So what do we gain with that extra size?

For a start, the classes are in a different package this time, net.n3.nanoxml, instead of nanoxml. We lose backwards compatibility with version 1.x due to this and also interface and class changes within the packages. If you use 2.x., your code will not be usable with 1.x, although there is planned support for a "lite" version of 2.0 that is almost compatible with version 1.6. There are some advantages to using 2.0, however.

Probably the most significant enhancement is that the parser is now a single-pass parser. Version 1.x releases were multiple-pass and their performance suffered because of it. Performance in 2.0 has significantly improved upon this aspect.

Version 2.0 Beta occupies less memory while parsing than version 1.6.7, but the memory requirements still scale linearly with the size of the document. All elements are saved internally as a tree of XMLElement objects, with each XMLElement object containing a java.util.Properties object to store element attributes. This are kept in memory until garbage collected. As we shall see in another section, this can lead to memory fragmentation depending upon the garbage collector in the virtual machine you are using.

Mixed content is now supported, for example:

<Request>ItemDetail

  <ItemId>553</ItemId>

</Request>

but class XMLWriter has some peculiarities around it (see the Child Methods section, page 595). Although the parser is still non-validating, the DTD isn't completely ignored as it was in version 1.x. Except for the <!ATTLIST> declaration, other DTD declarations appear to work. Predefined, general, and parameter entities are all supported.

Predefined entities are still supported. Additionally, any character can be referred to by its numeric reference (for example, &#64; for @). Predefined entities need not be declared in a DTD.

General entities are macros for an XML document. They associate parsed text with a symbol and must be declared in the DTD. For example:

<!ENTITY copyright "© FishHeads, Inc. 2001">

Referencing this general entity in an XML document that uses the DTD in which copyright is declared can be done like so:

<rights>&copyright;</rights>

A parser that recognizes general entities should expand the parsed text to:

<rights>© FishHeads, Inc. 2001</rights>

Just like general entities, parameter entities act as macros and are declared in the DTD. However, unlike general entities, their use is limited to the DTD - they cannot be referenced in XML. Since NanoXML isn't a validating parser, parameter entities aren't very useful. Perhaps this is provided as an intermediary step towards making NanoXML a validating parser. In any case, parameter entities are declared with the ENTITY keyword, a percent sign, a name, and the replacement value.

For example:

<!ENTITY % requestParameters "name CDATA #REQUIRED">

Whenever the parser encounters requestParameters in the DTD, it will substitute the quoted string. Here's a usage example:

<!ATTLIST Request %requestParameters date CDATA #IMPLIED >

A parser that recognizes parameter entities should expand the above to:

<!ATTLIST Request name CDATA #REQUIRED date CDATA #IMPLIED>

Note that all parameter entities must be declared before they are referred to in a DTD. Interestingly enough, using parameter entities results in an XMLParseException, although they can be declared without any problems. Perhaps this will be fixed before a production release of NanoXML 2.0, but it's worth remembering in future.

Package net.n3.nanoxml

This package consists of four interfaces and nine classes. The interfaces, IXMLBuilder, IXMLParser, IXMLReader, and IXMLValidator, are all intended to allow you to plug your own code into NanoXML. You could write your own reader, for example, and by extending IXMLReader, it would then plug into the NanoXML framework. You might choose to do this if your data comes from an unconventional source, a Palm OS database for example.

We won't cover the interfaces in too much detail, as there are concrete classes that implement
them. We'll cover those classes, StdXMLBuilder, StdXMLParser, StdXMLReader, and NonValidator instead.

Class XMLElement

net.n3.nanoxml

public class XMLElement

       implements java.io.Serializable

This class, even though it existed in version 1.x, has changed significantly. Some methods have been removed, and some new ones have been added.

Constructors

You no longer have to construct an XMLElement object unless you are building documents. When parsing documents, the object implementing IXMLBuilder (usually StdXMLBuilder) will provide the root element through its getResult() method (covered below). So you really only need to concern yourself with the following methods if you need to build documents with NanoXML.

public XMLElement()

public XMLElement(String name)

The default constructor is provided for #PCDATA text. To support mixed content in the XMLElement() class, #PCDATA is treated as an XMLElement object with no element name. We'll go into this in more detail in the Child Methods section (page 595), but this point is very important.

Use the default constructor XMLElement() for #PCDATA. Use the other constructors for element nodes.

The name argument represents the name of the new element.

Children Methods

These methods enable access to child elements. They are typically used after parsing a document. Note that all elements in a document, including #PCDATA text, are represented as XMLElement objects. There is no concept of siblings in NanoXML as there is in the Document Object Model (DOM). Each element is a child of the element directly above it.

public int getChildrenCount()

public boolean isLeaf()

public boolean hasChildren()

public Enumeration enumerateChildren()

public Vector getChildren()

public XMLElement getChildAtIndex(int index)

public XMLElement getFirstChildNamed(String name)

public Vector getChildrenNamed(String name)

public Vector getChildren()

Again we see the quirkiness of NanoXML: getChildrenCount() was called countChildren() in version 1.6.7. There is no apparent reason for the name change except perhaps to further the incompatibility between the two releases! Also, isLeaf()and hasChildren() are redundant methods, providing the same information.

The arguments name and index are the name or index of the desired child(ren). enumerateChildren() and getChildren() existed in version 1.6.7 and return an Enumeration or Vector of child XMLElements.

getChildAtIndex() will throw an ArrayIndexOutOfBoundsException if its index argument isn't valid. Likewise, getFirstChildNamed() will return null if no such child with element name name exists.

Usage and Examples

Here's an example that gets all elements named ItemId in an XML document fragment and outputs each element's #PCDATA content. We'll cover the getContent() method in the next section.

Enumeration enum =root.getChildrenNamed("ItemId").elements();

while (enum.hasMoreElements()) {

  XMLElement elem = (XMLElement)enum.nextElement();

  System.out.println("content is " + elem.getContent());

}

Now let's go over the methods for adding, removing, and accessing individual child elements.

Child Methods

public void addChild(XMLElement child)

public void removeChild(XMLElement child)

public void public void removeChildAtIndex(int index)

public void setContent(String content)

public String getContent()

 Arguments

Arguments

Type

Effect

child

net.n3.nanoxml.
XMLElement

The element to add or remove to or from
the document

index

int

The index into the document where the first element is 0

name

java.lang.String

The name of the element

addChild() adds an XMLElement to the document as a child of another element, while removeChild() and removeChildAtIndex() remove an element from a document. The latter provides a very simple XPath-style way of removing children.

setContent() and getContent() allow you to set the #PCDATA content between an element. It is important to know how these methods behave with regards to setName() and getName(), the functions used to get/set the name of an XMLElement. Using setContent() and getContent() incorrectly will break the XMLWriter class, which is used for outputting documents (see Class XMLWriter, page 601). If you create an XMLElement object with the constructor:

public XMLElement(String name)

this creates an element with name name. The correct way to add #PCDATA to this element is to create another XMLElement object using the default constructor:

public XMLElement()

then calling setContent() on the returned object, and adding that object to the first one using addChild(). If you instead call setContent() on the object returned by the named constructor, XMLWriter won't display subelements of that object.

To summarize, here is a code snippet that works just fine:

root = new XMLElement("Request");

XMLElement rootPCDATA = new XMLElement();

rootPCDATA.setContent("An Auction Request");

root.addChild(rootPCDATA);

XMLElement child1 = new XMLElement("Parameters");

root.addChild(child1);

XMLWriter writer = new XMLWriter(System.out); //output the
                                                document

writer.write(root);

The output of this snippet is:

<Request>

  An Auction Request

  <Parameters/>

</Request>

and here is a code snippet that does not work fine (even though it looks like it should):

root = new XMLElement("Request");

root.setContent("An Auction Request");

child1 = new XMLElement("Parameters");

root.addChild(child1);

writer = new XMLWriter(System.out); //output the document

writer.write(root);

The output of this snippet is:

<Request>An Auction Request</Request>

You can see that the <Parameters> element is missing.

Attribute Methods

These methods allow you to get, set, and remove attributes on an XMLElement object. Remember that NanoXML stores attributes as a hashtable in memory (actually it's a java.util.Properties object, but that's derived from Hashtable). This interface is much more intuitive than the version 1.6.7 interface.

public void getAttribute(String name)

public void getAttribute(String name, String default)

public void setAttribute(String name, String value)

public void removeAttribute(String name)

public Enumeration enumerateAttributeNames()

public boolean hasAttribute(String name)

public Properties getAttributes()

The removeAttribute() method is new for this release. If you're familiar with version 1.6.7, you'll notice that all of the extraneous getAttribute() methods that take different data types (int, double, String) are now gone. All attributes are now treated as Strings, a much simpler approach. All the method names also now use the xxxAttribute() convention instead of the xxxProperty() convention used in version 1.6.7. Again, this is more intuitive as the standard XML terminology for these items is attribute, not property.

The first two methods allow you to retrieve attributes of an element. The first method returns null if the attribute doesn't exist, while the second method returns default. enumeratePropertyNames() allows you to iterate through the set of attributes for an element, while getAttributes() returns the internal Properties structure to you directly.

Arguments

Arguments

Type

Effect

name

String

The name of the attribute to look up

value

String

The value of the attribute

default

String

The value that is returned if the attribute doesn't exist

Class XMLParserFactory

net.n3.nanoxml

public class XMLParserFactory

extends java.lang.Object

This class provides convenient static methods for instantiating a parser, reader, builder, and validator all at once. These four objects interact with each other to parse a document. The parser object is the "glue" which contains the reader, builder, and validator. It is represented by the IXMLParser interface (see section Class StdXMLParser, page 600).

This class, XMLParserFactory, is not essential and could actually be removed from the library. It would save almost one kilobyte. The Usage and Examples section below shows how to do this. However, typical NanoXML 2.0 code that parses XML will start by calling one of these methods:

public static IXMLParser createDefaultXMLParser()

       throws ClassNotFoundException, InstantiationException,

              IllegalAccessException