Mark Wilson I am the creator of TopXML. I am available for international and local (Australia) contracts. I am a Solution Architect/Business Analyst. I have worked in IT in several countries (NZ, Australia, South Africa, UK) building and training teams for government and very large non-governmental organizations. I am ex-Microsoft Consulting Services. I wrote the first book on Microsoft XML published in 2000 called XML Programming with VB and ASP. Most recently I have been building tools for the SEO industry. Ask me for a 37 point SEO health-checkup for your website.
No part of this chapter may be reproduced, stored in a retrieval system or transmitted in any form or by any means -- electronic, electrostatic, mechanical, photocopying, recording or otherwise -- without the prior written permission of the publisher, except in the case of brief quotations embodied in critical articles or reviews.
This chapter focuses on Java, XML, and XSLT technologies for
lightweight clients. Lightweight clients are defined as those with
more limited resources than traditional clients. The term
information appliance is used interchangeably with lightweight
client. The obvious devices that fit within this category are
personal digital assistants (PDAs), mobile phones, and pagers.
However, many other embedded devices and consumer electronics may
fit into this category: television set-top boxes, global
positioning system (GPS) receivers, thermostats, watches, digital
cameras, even Internet appliances such as kitchen stoves,
refrigerators, and radios. Non-consumer-oriented devices also can
fit in this category, such as industrial automation and control
sensors.
However, this definition is not limited to non-PC devices. Any
environment that requires the following should qualify:
A small memory
footprint
Limited CPU overhead or
availability
Restricted network
bandwidth
Applets, in a typical browser, also fit this lightweight
category. As we shall see, XML generation, parsing and
transformation are just as important for these types of clients as
they are for thin clients (browsers) and servers.
In this chapter, we will address three key Java XML technologies
for lightweight clients:
Lightweight parsers and
document generators - Reference material for three parsers and
document generators. Through examples, we will demonstrate the
usage of three lightweight XML parsers, two of which can also
generate documents by enabling you to create a DOM-style node
tree.
XSLT compiler - a
Java tool that creates fast and lightweight Java class files for
transforming XML given an XSL stylesheet
CLDC (Connected Limited
Device Configuration) and the Java KVM (Kilobyte Virtual
Machine) - a Java specification for limited devices, which
includes a reference implementation written by Sun. The Java KVM, a
virtual machine redesigned for the constraints of limited devices,
is part of that implementation.
We can do the same essential processing, parsing, and
transforming tasks with these tools that we have used elsewhere in
the book. Examples in the chapter will show how to work around some
of the limitations of these tools, and how we can leverage their
small size to get them to run on devices that would not support
their heavyweight counterparts.
There are a number of acronyms and terms you'll encounter in the
following sections, so let's briefly cover some terminology before
we continue:
Term
Definition
CDC
Connected Device Configuration - defines a base set of I/O,
connectivity, and other classes for "heavy" lightweight clients
such as set-top boxes and audio/visual equipment
CLDC
Connected Limited Device Configuration - defines a base set of
I/O, connectivity, and other classes for lightweight clients such
as pagers
J2ME
Java2 Platform, Micro Edition - the Java2 platform for
information appliances (lightweight clients)
Java KVM
Java Kilobyte Virtual Machine - a Java virtual machine designed
to minimize its memory footprint instead of maximizing its speed.
Currently ported and compiled for Linux, Solaris, Windows, and Palm
OS
PDA
Personal Digital Assistant - a digital organizer, consisting of
applications such as an address book, date book, and notepad
This chapter starts by discussing why we should consider using
XML on lightweight clients. We introduce the Java 2 Platform, Micro
Edition (J2ME) and its architecture. Then, we cover three
lightweight XML parsers and the XSLT Compiler. Finally, we conclude
with a Palm OS application that beams address book entries in XML
format from one Palm device to another using the Java KVM.
Any discussion of using XML on lightweight clients typically
leads to solutions that do not adhere well to standards. W3C
XML-related recommendations and standards usually don't have
lightweight clients in mind, as they are written independently of
any platform or operating system. The implementation of these
recommendations often involves resource-intensive processing not
possible on lightweight clients. Therefore, many W3C
recommendations, for instance namespaces and DOM, are not
supported, in order to keep library sizes down.
Instead of looking at the XML components covered in this chapter
as non-standard, I encourage you to view them as you might have
viewed tools of the early World Wide Web: useful, but non-standard
and non-standardized. This will change with time, as we can see by
the recent conglomeration of multiple lightweight Java initiatives
into the far-reaching J2ME.
Lightweight client support for XML has largely been ignored in
the XML revolution. This may be a reflection of the role of the
client-side developer. Traditional client-side developers have all
but disappeared from many contemporary web application
developments, ever since n-tier architecture has displaced the
client-server paradigm in the enterprise.
Therefore, if the developers themselves have slimmed down in
numbers, the tools they use are bound to become less common. The
lack of these tools is also perhaps a construct of what today seems
a predominantly server-dominated industry. Perhaps they are lacking
because of the ease with which web applications enable developers
to forget about them.
Prevailing attitudes can be summed up this way: "Anyone who
understands my DTD or XML Schema, can display this document class
as they please." Even server-side developers churning out WML today
are probably still treating WML as yet another document class that
their server application needs to support.
But the number of document classes being published by
enterprises grows every day. Servers that produce one or more of
these myriad of XML formats (WML being one of them) suddenly
complicate things on the lightweight client where slick heavyweight
browsers with ActiveX controls don't exist. Lightweight clients
simply don't have the capabilities and resources available to
browser environments.
The role of the lightweight client-side developer has now been
boosted to that of browser developer or "XML processor" developer.
More generally, the client-side developer now has a rejuvenated
role as a Java XML developer in the world of lightweights,
especially with the success of the Java 2 Platform, Micro Edition
(see J2ME, page 568). In the future, if the modular and
lightweight XHTML Basic (see Too Many Client Formats, page
566) becomes popular and natively supported by vendors, client-side
development may be relegated back to that of scripting with most
work done on the server-side. However, we have not yet reached that
point.
Most contemporary discussions about XML technologies focus
around server-side issues, such as document generation from a
relational database, document parsing and persistence to a data
store, document transmission, or document transformation for an
anticipated client (such as a Compact HTML browser). When
client-side issues are addressed, they are often limited to
Microsoft Internet Explorer or Netscape Communicator.
Case in point: Microsoft has substantive support for XML with
MSXML in Internet Explorer. The latest release of MSXML, 3.0,
supports:
XSL Transformations
(XSLT)
XML Path Language
(XPath)
XML Namespaces 1.0
DOM
SAX 2.0
Organization for the
Advancement of Structural Information Standards (OASIS) XML
1.0
test suite
Secure server-to-server
XML with HTTPS
This is impressive, but these services are implemented as
ActiveX components intended for use within Internet Explorer on the
client, or as ASP pages on the server. There are plenty of clients
that don't support ActiveX components. I doubt that Microsoft's own
UltimateTV and Xbox, two "heavy" lightweight clients, can make use
of MSXML. UltimateTV is essentially a digital VCR that can record
two channels simultaneously (similar to Tivo in the United States),
and Xbox is their Sony Playstation-style games unit. Other
consumer-oriented and embedded devices would have similar problems
with MSXML. As a side note, Sony has announced that they will
integrate Java technologies into the Playstation by the end of 2001
(see http://www.javasoft.com/features/2001/06/sony.html)
So, what do we do if we need to parse, generate, or transform
XML on a lightweight client? Do we even need to do this at all?
The future will show that, as Java and XML developers, we must
pay more attention to XML technologies on lightweight clients. Even
server-side-only developers, who today often just transform their
XML into a subset of (X)HTML supported by the most common browser,
will have to change their approach.
There are at least five reasons why you need or will need to
parse, process, generate, and transform XML on lightweight clients.
We will go into detail on each one of these:
Lightweight client-side
development. If you're a lightweight client-side developer that
will be receiving content from providers who publish XML, you will
need a way to parse and process XML documents of their document
class. You may also need to generate XML documents to send back to
the provider
Too many client
formats. If you're a lightweight client-side developer and your
client is going to receive content from multiple providers, each of
whom publish XML using different schemas (very likely given today's
state of affairs!), you may want to transform those documents into
a generic document class before processing them. As a server-side
developer, you may want to publish your content in one form,
instead of trying to keep up with all the latest standards and
recommendations, and push the burden of transformation to the
client
Peer-to-Peer
networking. If your lightweight client application is part of a
peer-to-peer network or you are designing a peer-to-peer network,
you may want to communicate with the other clients in the network
through XML
Information appliance
interoperability. Embedded devices, smart consumer electronics,
PDAs, mobile phones, and Internet appliances can all interoperate
with each other and the Internet using technologies such as
Bluetooth, Jini, Ricochet, CDPD, GSM and GPRS, WiFi (802.11b), and
HomeRF. The need for common data exchange formats grows as the
interoperability of these devices grows. Even if the underlying
communications mechanisms are black boxes, the application
developer is presented with new opportunities and challenges, as he
now has a multitude of information appliances connected that
previously weren't
Powerful
lightweights. If you extrapolate Moore's Law, we'll all
eventually have turbo-charged mobile phones and PDAs at a cost too
cheap to ignore. We could use some of that power for XML-related
and XSLT tasks
Lightweight Client-Side Development
With the Java 2 Platform Micro Edition now a reality on PDAs,
embedded devices, and actually shipping on some mobile phones, the
possibility for rich Java applications with network connectivity on
lightweight-clients is here.
All the lessons learned from the client/server days before XML
are not forgotten simply because we're on a constrained device. If
we want to leverage the benefits afforded by XML (which have been
addressed by many other books and articles but is outside the scope
of this chapter) on modern lightweight clients, we'll need a way to
parse and process XML documents delivered to us by servers. From
there, we can display the data to the user and/or store it
locally.
We might then wait for user input, or gather system information,
and package it up into an XML document for transfer to a server.
For example, we might query the current price of an item on our
auction web site.
Wireless Markup Language
and WMLScript for WAP networks
Web clipping applications
for Palm.Net and OmniSky networks
HTML, and even
though most agree it is inappropriate for information appliances,
most web sites are still publishing their content only in this
format. There's even the HTML 4.0 Guidelines for Mobile Access (http://www.w3.org/TR/1999/NOTE-html40-mobile-19990315), which
describes what parts of HTML should be avoided for information
appliances
Handheld Device Markup
Language (HDML), originally created by Unwired Planet (Phone.com
/Openwave - http://www.openwave.com) in 1995 and submitted to the
W3C in 1997. It is not XML-compliant, nor does it have scripting
capabilities as with WML's WMLScript (however, the Openwave WAP
Edition browser does display WML/WMLScript as well as HDML, while
its Universal Edition browser displays WML/WMLScript, xHTML, and cHTML)
Proprietary
formats, which may not even be XML, such as those for the
Xircom Rex
This list isn't intended to be complete. It demonstrates the
alphabet soup of lightweight client document classes. XHTML Basic
is an attempt to rein in this rabble. It defines a common base that
includes images, forms, basic tables, and object support. It is
intended for web clients that cannot or do not support full XHTML
or HTML 4.0, and can be extended through modules (see
Modularization of XHTML at http://www.w3.org/TR/2000/CR-xhtml-modularization-20001020).
However, the verdict is out on whether or not XHTML Basic will
be widely adopted: it was only officially made a recommendation in
December of 2000. Even if it does become widely adopted, the
recommendation seems inherently "user-interface-centric". The
introduction states, "Because there are many ways to subset
HTML " By "user-interface-centric" I mean that data-driven
applications, such as some of those found in industrial control,
probably care nothing about subsetting HTML. They may not benefit
as much by using XHTML Basic as, for example, mobile phone
applications. However, even if XHTML Basic takes off and solves the
multiple document class problem for user-interface-driven
applications, there will still be browser developers on lightweight
clients who will need to parse, process, and generate XHTML
Basic.
In the meantime, we can make a generic J2ME client that
understands all of these formats, or as many of our custom formats
as we like. By transforming each of these formats into our own
document class before processing, we could reduce the size of
lightweight client code significantly. You can use a tool like
XSLTC (see XSLT Compiler, page 605) to do this. Then, you
can parse and process the transformed XML with one of the parsers
reviewed in this chapter or the parser that comes with XSLTC. You
might also need to use XSL on the client-side if you are displaying
to the user a single, integrated service which is actually
comprised of multiple smaller services from different servers, each
publishing content in a different document class.
But ultimately, why should the document provider care about what
kind of client he talks? He should publish his XML with a DTD or
schema, and leave the rest to the client. This follows the lessons
of encapsulation and distributed object-oriented design that we've
learned as a development community over the years, even if it goes
against the popular notion of "thin clients".
Use XML Document Servers
Imagine trying to publish content and data in HTML, cHTML, WML
and WMLScript, HDML, and in the Web Clipping Application format. No
problem, you say: we store all our data natively as XML. All we
have to do is write XSL transformations for each format and expose
addresses where each content type can be reached.
That could be a lot of work to reach all the new devices or
networks, especially as new document classes are popping up all the
time and old ones are dying out so we'll have to keep on writing
new XSL transforms. Here is a case in point: if WML/WMLScript
overtakes the older HDML format in the US, it might spell doom for
thousands of existing HDML applications. Fortunately, WAP gateways
transform HDML into WML - but relying on infrastructure providers
for upgrade paths is dangerous.
Instead of publishing content data in one of the formats we
talked about above (such as WML), we should consider publishing it
to the client in XML with an associated DTD or XML Schema. As
discussed previously, the transformation process rightfully belongs
to the client.
Peer-to-Peer Networks
Client-to-client networks, like Napster and Jabber (which have
centralized directory services) or GnuTella (with no centralized
directory service), have yet to explode in the information
appliance world. Jini technology and Project JXTA
(http://www.jxta.org) are addressing them. Jini is a mechanism for
connecting distributed services in a network using a directory
service. Project JXTA is a mechanism for connecting distributed
services in a peer-to-peer (P2P) network where no directory server
exists. Additionally, data (called codat, to indicate
anything from code, data, or applications, to text, images,
serialized Java objects, or SOAP packets) is sent across JXTA pipes
as XML.
JXTA is quite new, so there aren't many applications out there
using it yet. However, it does come with a graphical application
called InstantP2P. InstantP2P implements:
Instant messaging within
"peer groups"
P2P file sharing
Peer groups are collections of peers that publish, limit, and
control access to codat among other peers in the group. In
addition, each peer group defines its own membership requirements
to secure peer group membership.
The lack of widespread use of P2P networks might be partially
due to the single-threaded operating systems that many information
appliances employ. If you want to share applications or MP3s on
your Palm OS device over a P2P network, for example, you won't be
able to look up your wife's phone number at the same time.
Whatever the reason for the lack of their widespread use, P2P
networks for lightweight clients have enormous potential. If we're
to leverage the openness afforded by XML, we must give clients in
P2P networks the ability to natively generate, parse, process, and
transform XML. Then, the data (or codat) they exchange can be
specified as XML documents.
Information Appliance Interoperability
What if you want your Java 2 Platform, Micro Edition mobile
phone application to dial a phone number stored in your PDA?
If both devices are Bluetooth-enabled, and your mobile phone
application uses the Java API for Bluetooth (JSR-000082, http://java.sun.com/aboutJava/communityprocess/jsr/jsr_082_bluetooth.html),
we're almost able to dial that number. First, the address book
application on your PDA must expose directory services in a
standard way so that your mobile phone application can look up
contact information. Then, the mobile phone application can request
a telephone number, or an address book entry, and the address book
application on the PDA can send a reply containing the telephone
number.
The messaging format between the two applications needs to be
understandable by both. It should also be generic enough so that
other applications on the same or different devices could
understand it.
If ever there was a cry for XML, this is one of them. The PDA
application, as the document provider, needs to generate XML. The
mobile phone application, as the document receiver, needs to parse
and process XML. Only then can our applications take advantage of
all the benefits of XML on a device-to-device level.
Powerful Lightweights
As of the time of writing, powerful lightweights are already
available. For example, the Compaq iPaq 3670 comes with 64MB RAM
and can be expanded to 128MB with an optional CompactFlash card.
Instead of the extra RAM, we could choose an IBM Microdrive and
we'll have 1GB of storage. With the Dual-Slot PC Card Expansion
Pack, you can plug in two Type-II PCMCIA cards. Fill one of them
with the Novatel Wireless Merlin for Richocet and you'd have 128
kbps wireless Internet access. In the other slot you could add yet
another 16MB of RAM or perhaps a GPS receiver.
Consider all this, and that the unit has a built-in microphone,
speaker, and even a light sensor, and I think you'll agree with me
that this device can do more than some desktops.
Yet you don't see many people carrying these high-end
configurations around, the most obvious reason for that being their
high cost. A device like the one above, so outfitted, at present
costs thousands of dollars.
If the past is any indicator, devices like this will come down
in price and shrink in size. They may even become "wearable" (Sanyo
has already announced a line of raincoats for sale in Japan that
have a pocket custom-fit for your Palm). If all this happens we'll
wonder how we ever survived without one.
So as prices come down and Internet appliances become more
powerful and widespread, some of the burden of XML processing and
transformation should be pushed onto the client. Complex XSL
transforms and larger DOM trees won't pose problems, and the
document provider can remain purely in its XML world. With powerful
clients, client-side processing can become a reality.
The Java 2 Platform, Micro Edition (or J2ME Platform) is one of
the three Java 2 editions published by Sun Microsystems. You are
probably more familiar with Standard (J2SE) and Enterprise (J2EE),
but Micro is rapidly gaining popularity with vendors and
developers.
Just as J2SE and J2EE comprise a set of tools and libraries, so
too does J2ME. However, J2ME's tools and libraries are targeted at
a different set of devices than its larger siblings. Everything
from smart cards and pagers to mobile phones, PDAs, and set-top
television boxes are potential targets. These information
appliances span a wide range of functionality, features, and
capabilities.
To address this range of functions, J2ME has a layered
architecture that allows vendors and developers to maximize the
capabilities of the target device while still retaining some
interoperability and platform independence. Consider the
differences between a set-top television box and one-line
pager:
Set-Top Box
Pager
Virtually unlimited power supply
Battery powered
Lots of memory
Very little memory
Relatively speedy performance
Slow performance
Big monitor for user interface (TV)
Single line of LCD
Always - on high bandwidth network
Low bandwidth network
Persistent storage
Little persistent storage
We wouldn't want the libraries on the one-line pager to have to
include the class javax.swing.tree.DefaultMutableTreeNode, but
maybe we would want it on the set-top box. To maximize the
capabilities of both devices while still using the same Java 2
edition, the following layered architecture was designed:
Operating System -
the unit's OS, for example Palm OS, Symbian/EPOC,
or Windows Pocket PC
JVM layer: the Java
virtual machine and its services (such as the garbage collector)
compiled for this OS. This layer is an implementation and
compilation of the Java Virtual Machine for a particular operating
system. It translates bytecode into native operating system calls.
Sun has made available the Kilobyte Virtual Machine (page 572). We
will be using the KVM in the sample application later in this
chapter, but here is a list of some VMs currently available for
information appliances
IBM J9 (http://www.embedded.oti.com/)
Kada Mobile http://www.kadasystems.com/)
Esmertec Jbed (http://www.esmertec.com/)
SuperWaba (http://www.superwaba.org/)
Waba (http://www.wabasoft.com/)
Symbian
(http://www.symbian.com/) includes both an operating system and
VM
Sun KVM
(http://java.sun.com/products/cldc/)
Some of these VMs may not fit into the J2ME-specified
framework.
Configuration
layer: base classes that must be available to any profile (and
application) compliant with a particular configuration. For
example, I/O classes, data types and structures, and network
connections are typically specified in this layer
Device profile
layer: classes built upon ones specified in the configuration
layer. GUI classes are specified here. Choosing a profile dictates
a configuration, as profiles are typically only available for one
configuration. Applications written for a particular profile are
guaranteed to work on any device that implements that profile
Application layer:
this is your application code. Before writing an application, you
must decide upon what profile it will depend. Choosing a profile is
an important step because it breaks interoperability; your
application is not guaranteed to work on another profile. Just like
any Java application, J2ME applications define their own classes
and have main() entry points. Applications make use of
classes published by the profile and configuration layers
It's worth mentioning that there are a number of products that
allow applications written in Java to run on information appliance
operating systems without virtual machines. They work by compiling
Java bytecode into native operating system-dependent machine code.
Some of these products are compilers that are intended to compile
during an off-line build process, before application deployment.
Others are complete VMs containing a compiler that compile during
or just before runtime (similar to just-in-time and HotSpot).
Configurations
A configuration is a device class. If we remember our
set-top box and pager comparison (see J2ME, page 568), these
are clearly two different classes of devices. As such, they
represent very different target platforms. The J2ME engineers knew
the Write-Once, Run-Anywhere model that made J2SE and J2EE so
successful wouldn't work for all classes of information appliances,
so they made a decision to break application interoperability when
they defined configurations.
A configuration is a class of information appliance.
There are currently two configurations within J2ME: "connected
devices" (like set-top boxes) and "limited connected devices" (like
PDAs or pagers). Each focus has different qualities, so they have
their own definition of classes, which must be available to the
VM.
As stated before, each configuration defines and implements a
set of classes that must be available to the VM. The CLDC both
inherits classes from the J2SE and defines its own classes. Here is
a partial list of those classes inherited from the J2SE, many of
which are subsets of their original, reducing them only to their
essentials:
Sun provides a reference implementation of the CLDC, which
includes the Java KVM. Other vendors are working on CLDC
implementations that hold great promise, such as IBM's VisualAge
Micro Edition with the J9 virtual machine.
Profiles specify another set of Java classes that must exist
along with the classes made available by the configuration to which
the profile belongs. They are targeted at specific industry
segments. Together, a profile and its configuration specify a full
set of Java classes for a particular type of device. Applications
written for a particular profile are guaranteed to work on any
device that implements that profile.
An important point to note is that user interface components are
specified in profiles, not configurations. This is because the
configuration encompasses too broad a spectrum. For instance, a PDA
and a pager are both CLDC devices, but of what need is the CheckBox
class on a one-line pager?
User Interface components are specified in profiles, whereas
I/O, network connectivity, and data types are defined in
configurations.
There are hundreds of proposed profiles. Here are four examples.
Note that a profile defines the configuration on which it
depends.
Mobile Information
Device Profile for CLDC - defines user interface, data storage,
messaging, networking, security, and wireless telephony for mobile
devices. http://jcp.org/jsr/detail/037.jsp
Foundation Profile for
CDC - defines a base profile for devices that have 1MB ROM,
512KB RAM, rich network connectivity, but no user interface. User
interfaces can be layered on top of this profile by defining
another profile. http://jcp.org/jsr/detail/046.jsp
Personal Profile for
CDC - this profile/configuration combination actually defines
the successor to PersonalJava (http://java.sun.com/products/personaljava/). Applications written
to versions 1.1.x and 1.2 of the PersonalJava API specification
will work with this combination. It appears that
PersonalJava will be absorbed into the J2ME framework under
this profile/configuration.
Part of Sun's reference implementation of the CLDC, the Java
Kilobyte Virtual Machine (KVM) was designed to operate with as
little as 160 to 512KB of total memory on 25MHz 16 or 32 bit RISC
or CISC processors. It was written in C and is freely available for
four operating systems:
Win32 (Windows
2000/NT/ME/98)
Solaris
Linux
Palm OS
It's interesting to note that Java Native Interface (JNI) calls
cannot be made from Java code unless the native call was linked
into the KVM when it was compiled. The KVM source code is provided
so you can do this. The reason JNI calls cannot be made without
this step is so that the size of the KVM is kept to a minimum.
Opening up native access to a device allows for the execution of
potentially hostile code unless security mechanisms are put into
place, such as exist in the Java 2 Standard Edition. This would
increase the footprint and complexity of the KVM. Its implementers
apparently weighed the tradeoffs and chose size over secure JNI
capabilities. Other VMs, such as the Kada VM (http://www.kadasystems.com/kada_vm.html), do allow JNI calls.
We'll be focusing on the Palm OS build of the KVM for this
chapter since it's a good demonstration of the usability of the VM
in a constrained device.
Remember that in the J2ME architecture, user interface and
device-specific network classes are defined at the profile level.
However, the KVM for the Palm was released at Sun's JavaOne
conference in 1999 - before the CLDC was released and a PDA profile
was defined. Actually, to date, there still isn't an implementation
of the PDA Profile for the CLDC, just a Java Specification
Request.
As the 1999 KVM release for the Palm wouldn't have been very
interesting without including user interface classes, the package
com.sun.kjava.* was born. It includes:
A set of simple user
interface classes, such as com.sun.kjava.Button
An event callback
mechanism so events like penUp, beamReceive, and keyDown can be
handled by application code
PDB (Palm database) access
classes for file input/output
Data structure classes,
such as com.sun.kjava.List and com.sun.kjava.IntVector
The package com.sun.kjava is unsupported and quite limited.
Additionally, with the newer J2ME architecture, it doesn't belong
as part of the CLDC distribution (it is currently provided as an
"overlay" to the CLDC files - see Setting Up the Environment,
CLDC/Java KVM, page 620). It is generally accepted that the
package will be renamed (for example, com.palm.kjava), rolled into
the PDA profile, or incorporated into a Palm OS-specific
profile.
We won't be using kAWT (http://www.trantor.de/kawt/) in this
chapter, but it's worth mentioning briefly. It is a simplified,
lightweight version of the AWT API for the KVM. If you are going to
do any serious GUI development with the KVM, this is the best
package currently available. It also optionally includes some very
useful I/O and networking classes.
One benefit of using the kAWT is that applications developed
with it will run under the AWT with J2SE (although the converse
isn't true). Currently, there are ports for Palm OS, IBM's J9,
Blackberry RIM, and the MID Profile (See Profiles, page
571). The disadvantages to using the kAWT are:
Higher storage capacity is
required (the Palm OS port is a 178KB PQA, including UI, I/O, and
all networking classes)
Loading kAWT applications
takes longer
It's non-standard
(although there currently is no standard CLDC GUI
implementation for PDAs)
Each of these five parsers has advantages and disadvantages with
varying support for W3C recommendations and standards. We will
examine some of these issues in this section. For the two parsers
we review in detail, there is a table of features included in
corresponding sections.
We'll also discuss the three different types of parsers: push,
pull, and object model, and the advantages and disadvantages of
each in regard to lightweight clients.
After we've reviewed XML parsers for lightweight clients, we'll
use some of the technologies discussed in the J2ME section
(page 568), along with an XML parser, to create a peer-to-peer
sample address book application.
Although push and object model parsers are the most popular and
well known, they are not always the best type of parser for
lightweight clients. We'll discuss this further in the next
section. This chart outlines lightweight XML parsers and the models
they implement. Note that some parsers give the option of parsing
documents using different models. For comparison, a heavyweight
parser, Xerces-J, has been included:
Parser
Type
Description
URL
NanoXML
Push and Object Model
Versions 1.x of this lightweight DOM-style parser offer optional
SAX 1.0 support.
Offers a DOM-style tree interface. For non‑lightweight clients,
it has the optional feature of an element-style class that
implements javax.swing.tree.MutableTreeNode and javax.swing.tree.TreeNode. This enables elements to be visualized
directly by a javax.swing.JTree.
Tiny parser that "aspires to be the smallest Java XML parser on
the planet", XParse-J offers custom DOM-style parsing interface.
Also a JavaScript version.
Works "out-of-the-box" with J2ME. Includes an XML writer and WAP
Binary XML support (WBXML), a binary encoding optimized for the
mobile phone Wireless Application Protocol standard.
XPP is small (21KB JAR) and fast and has both Java and C++
implementations. Supports namespaces and mixed content. Uses very
little memory during parsing.
Now let's discuss the three XML parser models in more depth.
Push Parsers
Push parsers are the class of XML parsers that publish a set of
interfaces, implemented by applications, through which the parser
relays document information.
SAX is the most well known XML push parser. After your
application tells the SAX parser to begin parsing, the parser calls
back (or pushes) into the application code to notify the
application of parse events. This model forces application code to
maintain state within the callback class(es), and to evaluate that
state at each event. That means many class variables in the
callback class(es), as well as (possibly) getters and setters for
those variables. This isn't very developer-friendly as it creates a
lot of extra work.
Additionally, SAX and most push parsers parse an entire document
at once. As soon as your code tells SAX to begin parsing a
document, the document is parsed in its entirety. For very large
documents, this means lots of state information must be maintained
- causing a potentially large memory footprint, not to mention all
the wasted processing and battery power that goes into parsing an
entire document if not all of it is needed (though not nearly as
much as a DOM parser would require).
We'll examine push parser issues in more detail in the
NanoXML section (page 577).
Object model parsers are that class of XML parsers that build
in-memory representations of XML documents using tree-like data
structures. The most popular are parsers conforming to DOM Level 1
and Level 2 specifications, but others exist (for example,
NanoXML).
Object model parsers, unlike push parsers, don't usually require
the developer to maintain document state during parsing, but they
have their own drawbacks on lightweight clients. Most lightweight
object model XML parsers keep an entire parsed document in memory
all the time, until the parser and its resources are garbage
collected. Parsing a large document with this kind of parser, even
if only one node from the whole document is required, always means
occupying large chunks of memory. This approach isn't desirable on
lightweight clients since their memory is constrained.
Also, as with push parsers, all the object model parsers for
lightweights that I know about parse an entire document at once. As
soon as your code tells the parser to begin parsing, the entire
document is parsed so an in-memory object model can be built. As
with push parsers, this wastes a lot of processor and battery power
if the entire document is not required.
Lazy Parsing
Some heavyweight object model parsers offer lazy parsing, for
example Xerces-J. Parsing lazily means that the object model is
built and stored in memory only as the calling application requests
a node. However, usually the entire ancestor-or-self axis (with
respect to the requested node) is stored in memory after the
request. Certainly, the entire ancestor-or-self axis must be parsed
when a node is requested. This isn't optimal for constrained
devices, but it's better than the "parse-and-store-it-all-at-once"
approach taken by present-day lightweight object model XML
parsers.
At the time of writing, no object model XML parsers are
available for lightweights that use lazy parsing. Hopefully, this
will change in the near future.
Pull Parsers
A newer player in the world of XML parsing, pull parsers aren't
nearly as prevalent as SAX and DOM parsers. kXML and XPP appear to
be the only feasible contenders today.
Pull parsers are particularly useful for lightweight clients
because they parse only the minimal chunk of a document necessary
when an application requests the next piece of data. The
application can process this data at its leisure and then ask for
the next piece, spurring the parser to parse just another small
chunk of the document. This is similar to the workings of a
java.io.Reader. The benefits of this approach are:
Processing and battery
power are used when and only when the application needs the next
piece of data; the application maintains control over its parsing
needs
Memory footprint is
reduced; the parser only needs to maintain minimal state
information and a pointer to the current element (although the
document itself must remain in memory for as long as parsing might
continue, so it's to the application's advantage to acquire what it
needs quickly). An entire object model does not remain
memory-resident.
Unfortunately, there is no standard interface yet for XML pull
parsing, like SAX for XML push parsing or DOM for XML object model
parsing. Therefore, although pull parsers sound great for
lightweight clients, they may be too immature for use today in
production applications. Applications would be stuck with the
limitations of the parser selected by the development team without
a clear upgrade path. It may be difficult or impossible (without
rewriting the application) to change parsers in the future.
Let's briefly look at some code that demonstrates how pull
parsers work. This code is based on sample kXML code from http://www.microjava.com/news/techtalk/kxml/. It outputs element
names and document text. Note the use of recursion, something
atypical in applications using push and object model XML
parsers.
public void traverse(Parser parser) throws Exception
{
A NanoXML document is a tree of nanoxml.XMLElement objects.
These correspond to the org.w3c.dom.Node interface in the DOM
specification.
NanoXML does not implement the DOM interfaces. You build and
retrieve document contents through a proprietary API, but an
optional SAX 1.0 component exists for document retrieval. This API
is covered in this chapter.
Originally written in April 2000, NanoXML has gone through a few
iterations. The current release is 1.6.8. The next major release of
NanoXML will be 2.0, and it is scheduled to be available in July
2001. The current beta release is promising, although it seems to
have lost compatibility with 1.x releases. We will discuss both
releases since they differ significantly in library size and
features.
The source code is available under an open source license. The
site is maintained by Marc De Scheemaecker, who is the author of
the package. I have found him to be very responsive to support
questions.
If your target platform is the Java KVM, the latest NanoXML
won't do the job because it has dependencies on classes that are
not included in the standard Java KVM. You'll have to get version
1.6.4 from the NanoXML web site and the kXMLElement class from http://www.ericgiguere
com/microjava /cldc_xml.html.
Without a doubt, the greatest feature of NanoXML is also its
smallest - its JAR file size. Its JAR file size is second only to
MinML, but the size depends upon which version you use and whether
or not you choose the optional SAX component. You can get away with
XML parsing in as little as 6047 bytes!
Unfortunately, NanoXML suffers from some performance issues and
memory usage problems, which we will discuss.
Entities are specified in the XMLElement constructor as a
hashtable of key-value pairs
SAX
Yes, SAX 1.0
-
DOM
No
-
Comments
Ignored
-
Processing Instructions
No
PI in the preamble <?xml version="1.0" encoding="UTF-8"?>
is ignored; subsequent PIs throw nanoxml.XMLParseException
Namespaces
Indirectly
Prefixes aren't distinguished from local parts -
<prefix:name> becomes an atomic element or attribute
JAR size
-
6047 bytes; 8618 with SAX support
Version 1.6.8 is a non-validating parser. Any reference to a DTD
or XML Schema is ignored, although there is support for entity
expansion.
Mixed content isn't supported, for example:
<Request>ItemDetail
<ItemId>553</ItemId>
</Request>
will result in an incorrect internal document representation.
XML namespaces aren't supported directly, although they won't cause
any parsing difficulties. This SOAP envelope, for example, is
parsed without problems:
The element <Envelope> is stored literally as
<SOAP:Envelope> with no comprehension of the SOAP namespace
prefix. It also contains five attributes: xmlns:SOAP, xmlns:xsi,
xmlns:xsd, xmlns:SOAP-ENC, and SOAP:encodingStyle. Since document
validation isn't supported, namespace URLs are not followed.
Comments are skipped by the parser and not stored internally.
The first processing instruction in an XML document:
<?xml version="1.0" encoding="UTF-8"?>
is skipped and also not stored internally. Any subsequent
processing instructions will throw a nanoxml.XMLParseException.
A SAX-compatible API can optionally be used with parsing (see
Package nanoxml, page 579). If the SAX API is not used,
retrieval of elements and attributes is through a completely
proprietary API (see public class XMLElement, page 580).
Documents can also be built from scratch and written to any
Writer object, or they can be modified using the addChild() and
removeChild() methods.
The JAR file size of this release, excluding the optional SAX
component, is 6047 bytes. Adding SAX functionality brings the
library up to 8618 bytes. But this small size doesn't come without
a price. As with most parsers reviewed in this chapter, NanoXML is
not XML 1.0 compliant.
You have two choices for parsing: a DOM-style or SAX 1.0
interface. Both choices are multiple-pass parsers, iterating over
the same document more than once in order to build an internal
representation (this is true even of the SAX interface because it
is built on top of the DOM-style interface). This negatively
affects performance. Finally, even if the SAX parser is used, an
entire document tree is built and kept in memory until the parser
object is garbage collected. Not only does this lead to a large
memory footprint when parsing large documents, but depending upon
the garbage collection mechanism used by your VM, it may severely
fragment the heap and prevent subsequent object creation. We
discuss this issue in the Java KVM section (page 572).
Parsing large documents with this version of NanoXML may be
inappropriate for lightweight clients. However, for relatively
small documents, it could be just the thing.
This is the only package in the NanoXML library. This version
has only two classes: XMLElement and XMLParseException. XMLElement
represents an XML document and its content. XMLParseException is
the exception that is thrown when a parse error occurs; for
example, when a document that is not well-formed is encountered.
Let's look at each of these classes in detail.
XMLElement is a representation of an XML document. In addition
to being able to parse XML documents, this serializable class
contains all the methods needed to get/set elements, subelements,
attributes, and text in a document. It derives from
java.lang.Object.
The XMLElement constructor can only take a few different
arguments. Each constructor makes use of a conversion table.
A conversion table is simply a Properties object, which is used to
mapentities to their conversion values. When an entity is found in
a parsed document, it is used as a key into the Properties object
to find the replacement value.
The default constructor creates a new XMLElement object with a
Properties map thatconverts the following predefined entities:
Entity
Maps To:
&
&
<
<
>
>
'
'
"
"
Here is a summary of the constructor arguments:
Arguments
Type
Effect
conversionTable
java.util.
Properties
Entities are keys into the Properties map that provide a
replacement value when the key is found in during parsing
SkipLeadingWhitespace
boolean
Directs the parser to ignore leading whitespace in #PCDATA
ignoreCase
boolean
Directs the parser to ignore element and attribute case. Useful
for HTML parsing.
If a conversion table is specified, the base entities specified
above (& < > ' ") are used
in addition to that table. The default values for the other two
arguments are false for skipLeadingWhitespace (whitespace won't be
skipped), and notice that the default argument is true for
ignoreCase.
Parse Methods
The parse() methods direct an XMLElement object to begin parsing
an XML document.
public void parseString(String string)
throws
XMLParseException
public int parseString(String string, int offset)
throws
XMLParseException
public int parseString(String string, int offset, int
end)
throws
XMLParseException
public int parseString(String string, int offset,
int end, int startingLineNr)
throws
XMLParseException
public void parseFromReader(java.io.Reader
reader)
throws IOException,
XMLParseException
public void parseFromReader(java.io.Reader reader, int
startingLineNr)
throws IOException,
XMLParseException
public int parseCharArray (char[]chrAry, int offset, int
end)
throws
XMLParseException
public int parseCharArray(char[] chrAry, int offset, int
end,
int startingLineNr)
throws
XMLParseException
The parseString() and parseFromReader() methods actually just
resolve to one of the parseFromCharArray() method calls.
Arguments
Arguments
Type
Effect
string
String
XML source content is contained within a string
reader
java.io.
Reader
XML source content is contained within a java.io.Reader
chrAry
char[]
XML source content is contained within a character array
offset
int
Marks where parsing should begin, counting from the first
character
End
int
Marks where parsing should end
startingLineNr
int
Marks from which line the parser should begin parsing
Usage and Examples
To direct NanoXML to parse an
XML file, we would write code like this:
BufferedReader br = null;
try {
br = new BufferedReader(new
FileReader("request.xml"));
}
catch (java.io.FileNotFoundException e) {
e.printStackTrace();
System.exit(0);
}
XMLElement elem = new XMLElement();
try {
elem.parseFromReader(br);
}
catch (java.io.IOException e) {
e.printStackTrace();
}
Children Methods
These methods enable access to child elements. They are
typically used after parsing a document. Remember that all elements
in the document are represented as XMLElement objects.
There is no way of accessing "sibling" elements in NanoXML as
there is in DOM; each element is a child of the element directly
above it.
public int countChildren()
public java.util.Enumeration enumerateChildren()
These methods take no arguments. Interestingly enough,
countChildren() only returns the number of children one level deep
in the tree. The other method returns the children n-levels
deep.
Child Methods
These methods perform operations on elements in the document. It
is helpful to think of the structure in memory as a Document Object
Model (DOM), even though it is not. Remember that all elements in
the document are represented as XMLElement objects; this is similar
to the Nodeinterface in the package org.w3c.Dom. Again, there are
no siblings in NanoXML - only children.
public void addChild(nanoxml.XMLElement child)
public void removeChild(nanoxml.XMLElement child)
public void removeChild(String key)
public void setTagName(String tagName)
public String getTagName()
public void setContent(String content)
public String getContents()
public String toString()
public void write(java.io.Writer writer)
public void write(java.io.Writer writer, int indent)
Some of the quirks of this version of NanoXML are apparent here:
we have a method called setContent(), but its accessor is called
getContents().This is resolved in NanoXML 2.0 beta. setTagName()
and getTagName() set the element name. setContent() and
getContents() set the #PCDATA of the node. The write() methods
output the node and its subnodes to a java.io.Writer.
Arguments
Arguments
Type
Effect
child
nanoxml.
XMLElement
The element to be added or removed to or from the document
key
String
The name of the attribute to be removed from the XMLElement
tagName
String
The textual name given to the element
content
String
The #PCDATA content of the element
writer
java.io.
Writer
The writer to which the element (or document) should be
written
indent
int
The number of spaces to indent for each new nested element
Usage and Examples
Here is an example of how an XML document can be generated and
written to a java.io.Writer.
Example: Generating and Writing an XML Document
The XML we generate is a request from our auction site, for the
detail information on item numbers 553 and 554.
Sample Request.xml
<?xml version="1.0" encoding="UTF-8"?>
<Request name="ItemDetail">
<Parameters/>
<ItemId
type="Integer">553</ItemId>
<ItemId
type="Integer">554</ItemId>
</Request>
NanoXML gives us no way to add processing instructions to a
document, so we will output the prolog <?xml version="1.0"
encoding="UTF-8"?> ourselves. The addProperty() methods haven't
been introduced yet, but they will be in the next section.
Source Request.java
import nanoxml.XMLElement;
import java.io.*;
public class Request {
public static void main(String args[]) throws IOException
{
//create the document and set the root
XMLElement root = new XMLElement();
root.setTagName("Request");
root.addProperty("name", "ItemDetail");
//create and set the first child
XMLElement child1 = new XMLElement();
child1.setTagName("Parameters");
root.addChild(child1);
//create and set next child
XMLElement child2 = new XMLElement();
child2.setTagName("ItemId");
child2.setContent("553");
child2.addProperty("type", "Integer");
root.addChild(child2);
//create and set the last child
XMLElement child3 = new XMLElement();
child3.setTagName("ItemId");
child3.setContent("554");
child3.addProperty("type", "Integer");
root.addChild(child3);
//create a writer and output the prolog
BufferedWriter bw = new
BufferedWriter(new
FileWriter("Request.xml"));
bw.write(new String("<?xml
version=\"1.0\"
encoding=\"UTF-8\"?>"), 0,
38);
//output the document and close the
writer
root.write(bw);
bw.close();
}
}
Compiling and Running
To compile Request.java, do the following from the directory in
which Request.java is saved:
The output is a file in the current directory called
Request.xml. It should look like Request.xml in the beginning of
this example.
Attribute Methods
These methods allow you to get and set attributes on an
XMLElement object. Remember that NanoXML stores attributes for each
element as a java.util.Properties object.
public void addProperty(String key, Object value)
public void addProperty(String key, int value)
public void addProperty(String key, double value)
public Enumeration enumeratePropertyNames()
public String getProperty(String key)
public String getProperty(String key, String default)
public int getProperty(String key, int default)
public double getProperty(String key, double default)
public boolean getProperty(String key, String
trueVal, String falseVal, boolean default)
public Object getProperty(String key,
java.util.Hashtable
valueSet, String defaultValue)
Once again, we see some of the quirks of this release of
NanoXML: the setter for getProperty() is called addProperty()
instead of setProperty(), this is fixed for NanoXML 2.0 Beta.
The first three methods allow you to add or set different types
of properties to an element. enumeratePropertyNames() allows you to
enumerate through the set of attributes for an element, while
getProperty() returns specific values for known named attributes.
The getProperty(String key) method returns null if the property
doesn't exist for the element.
Arguments
Arguments
Type
Effect
key
String
The name of the attribute to lookup
value
int, double, String
The value of the attribute
default
int, double, String
The value that is returned if the attribute doesn't exist
trueVal
String
The value of the attribute which should be interpreted as
representing true; for example, yes, true, or 1. This argument
gives you the flexibility of specifying what value to use for
Boolean true in the getProperty() methods that returns a boolean.
See Usage and Examples, below.
Table continued on following page
Arguments
Type
Effect
falseVal
String
The value of the attribute that should be interpreted as
representing false; for instance, no, false, or 0. This argument
gives you the flexibility of specifying what value to use for
Boolean false in the getProperty() method that returns a boolean.
See Usage and Examples, below.
valueSet
Java.util.
Hashtable
Stores the attributes of the element as key value pairs.
Usage and Examples
Let's first take a look at how to use this somewhat
non-intuitive method, which was removed from Nano XML 2.0:
public boolean getProperty(String key, String
trueVal,
String falseVal, boolean default)
This helper method makes it easier to determine values for
Boolean attributes. A Boolean attribute is an attribute with only a
Boolean value. Take, for instance, the <exec> element in
Jakarta's Ant project:
The failonerror attribute is a Boolean attribute whose values
are true and false. ColdFusion's <cfoutput> element also has
a Boolean attribute:
<cfoutput query="MyQuery" group="id"
groupcasesensitive="no"/>
However, its values are yes and no. getProperty() allows us to
test the value of Boolean attributes generically. For example, to
retrieve the value of groupcasesensitive, whose default value is
yes if not specified, from <cfoutput>, we would write:
boolean b = elem.getProperty("groupcasesensitive", "yes",
"no", false);
If groupcasesensitive isn't specified, the last parameter, false
in this case, marks its default value. Another example would be if
we were parsing Ant's <exec> element. We could write:
boolean b = elem.getProperty("failonerror", "true",
"false", false);
Note, however, that we could also write:
boolean b = (elem.getProperty("failonerror",
"false")).equalsIgnoreCase("false");
Now let's move to look at our previous XML document,
request.xml. Here is an example that reads it and outputs the
#PCDATA and type attribute value for each <ItemId> element.
Recall the XML document looks like this:
<Request name="ItemDetail">
<Parameters>
<ItemId
type="Integer">553</ItemId>
<ItemId
type="Integer">554</ItemId>
</Parameters>
</Request>
First, we must read and parse the file:
BufferedReader br =
new BufferedReader(new FileReader("request.xml"));
XMLElement elem = new XMLElement();
elem.parseFromReader(br); //elem is the root node
Then, we enumerate through each
child node of the root. If any child element is named
<ItemId>, we output its type attribute and its #PCDATA
content. If the type attribute doesn't exist for some reason, the
default value unknown is used.
This class is usually thrown when a non-well-formed document is
parsed or a processing instruction that isn't in the preamble is
encountered.
This class represents a NanoXML parsing exception. It extends
java.lang.RuntimeException. Even though processing instructions
that aren't in a document's preamble can certainly be part of a
well-formed XML document, NanoXML doesn't like it and will throw an
exception.
Adding the optional SAX 1.0 parser to NanoXML increases the
library's size by another 2,571 bytes (for a total of 8,618 bytes).
This is quite small, but it also increases your dependencies. For
example, the package makes use of java.net.URL,
java.io.InputStream, and java.util.Locale among others. Depending
upon your particular virtual machine and device profile, some or
all of these classes may not be available. You might be able to get
creative and rewrite some of the package if you want to reduce its
dependencies as was done for the Java KVM.
In addition to possibly not having all required classes, SAX is
a push parser. After telling the parser to begin, the parser calls
back (or pushes) into your application code to notify you of parse
events. This model forces your code to maintain state within the
callback class(es), and to evaluate that state at
each event.
One of the nice things we've seen in the previous code examples
is that there was no need for state information. This is much more
programmer-friendly than the code we are about to see.
In SAX's defense, however, it will allow you to plug another
parser underneath the hood without any code changes on your part.
It's a standardized API. All you need to do is use different class
files or a different JAR. If you're seeing performance or memory
usage problems with NanoXML, this will allow you to plug another
parser into your application without much work. However, you might
be better off ignoring the SAX standard and using the pull model of
parsing, such as that used by kXML and XPP.
Unfortunately, a standard pull API for XML parsing has yet to be
decided upon, so if you choose a pull parser, your upgrade path is
unclear.
Class SAXParser
nanoxml.sax
public class SAXParser
implements
org.xml.sax.Parser
This class implements the org.xml.sax.Parser interface published
by David Megginson. It's built on top of the class XMLElement so it
has all the features (or lack thereof) outlined in the
Features table on page 578. Here is a list of other features
applicable to this particular parser:
Feature
Support for org.xml.sax.Parser
Notes
Locales
English language only
SAXException thrown if another type of
local is set with setLocale()
Whitespace
ignorableWhiteSpace()
is never called
Leading whitespace in #PCDATA skipped
DTD validation
None
The objects implementing interface org.xml.sax.DTDHandler and
interface org.xml.sax.EntityResolver in your application are never
called back
Mixed content
None
XML such as
<Request>widgets<Item>553</Item></Request>
isn't permitted
Document locator
Support for line numbers and system identifiers
org.xml.sax.Locator.getLineNumber() and
org.xml.sax.Locator.getSystemId() are supported
Processing instructions
processingInstruction
() is never called
Additionally, this parser only supports locales using the
English language. It will throw a SAXException if another type of
locale is set using the setLocale() method. Attribute data types
are always reported as CDATA.
Since SAXParser makes use of the nanoxml.XMLElement class
internally, it has to choose one of the XMLElement() constructors
to use. These constructors dictate certain parsing behaviors (see
the section public class XMLElement, page 580). The default
parsing behavior is case insensitivity to element and attribute
names, to skip leading whitespace in PCDATA elements, and to expand
only the entities &, <, &go;, ', and
". However, this behavior can be overridden by deriving
your own class from SAXParser and implementing its
createTopElement() protected method to call a different
XMLElement() constructor.
Error handlers and document locators are supported, as well as
parsing from a URI.
Usage and Examples
Let's look at what it would take to implement one of our
previous examples using the SAX interface. This will give you a
good idea about what I mean by having to maintain state in your
application for push parsers like SAX.
This example reads the XML document from the section
Attribute Methods (page 597), request.xml, and outputs the
#PCDATA and type attribute value for each ItemId element. Recall
the XML document looks like this:
<Request name="ItemDetail">
<Parameters>
<ItemId
type="Integer">553</ItemId>
<ItemId
type="Integer">554</ItemId>
</Parameters>
</Request>
Remember, we want the same output that the previous code (which
used NanoXML) produced. To refresh your memory, the output was:
Type = Integer and item id = 553
Type = Integer and item id = 554
Here is the code that uses SAX.
You'll have to put David Megginson's sax.jar for SAX 1.0
(http://www.megginson.com/SAX/SAX1/index.html) in your CLASSPATH,
as well as nanoxml-sax.jar.
import nanoxml.sax.SAXParser;
import org.xml.sax.*;
public class RequestHandler extends HandlerBase {
private String _type;
public RequestHandler () throws Exception {
SAXParser parser = new SAXParser();
parser.setDocumentHandler(this);
parser.setErrorHandler(this);
parser.parse("request.xml");
}
public void startElement(String name,
AttributeList attrs) throws
SAXException {
if (name.equals("ItemId")) {
if (attrs.getValue("TYPE") ==
null)
_type = "???";
else
_type =
attrs.getValue("TYPE");
}
}
public void characters(char ch[], int start,
int length) throws SAXException
{
System.out.print("Type = " +_type + " and
item id = ");
System.out.println(ch);
}
public static void main(String args[]) throws Exception
{
RequestHandler t = new RequestHandler ();
}
}
Notice the private member variable type that saves the value of
the type attribute for the element currently being parsed. There is
no other way to implement this in SAX. This is a small example,
too. For more complex parsing, the amount of state needing to be
saved increases.
This code is also quite a bit larger than the code that used
nanoxml.XMLElement. SAX just isn't as programmer-friendly.
The second major release of NanoXML isn't due to be released
until July 2001. A beta release is available now, however. It lacks
SAX 1.0 support and there is still no direct support for XML
namespaces, but the author assures me that a 2.1 release will
support both SAX 2.0 and namespaces. NanoXML 2.0 is quite
different from 1.x, so the 1.x port for the Java KVM won't work
with 2.0 yet. Hopefully, a KVM port will be made available.
The beta release increases the JAR size from version 1.6.8 from
6,047 bytes to over 20,000 bytes, a significant increase. So what
do we gain with that extra size?
For a start, the classes are in a different package this time,
net.n3.nanoxml, instead of nanoxml. We lose backwards compatibility
with version 1.x due to this and also interface and class changes
within the packages. If you use 2.x., your code will not be usable
with 1.x, although there is planned support for a "lite" version of
2.0 that is almost compatible with version 1.6. There are some
advantages to using 2.0, however.
Probably the most significant enhancement is that the parser is
now a single-pass parser. Version 1.x releases were multiple-pass
and their performance suffered because of it. Performance in 2.0
has significantly improved upon this aspect.
Version 2.0 Beta occupies less memory while parsing than version
1.6.7, but the memory requirements still scale linearly with the
size of the document. All elements are saved internally as a tree
of XMLElement objects, with each XMLElement object containing a
java.util.Properties object to store element attributes. This are
kept in memory until garbage collected. As we shall see in another
section, this can lead to memory fragmentation depending upon the
garbage collector in the virtual machine you are using.
Mixed content is now supported, for example:
<Request>ItemDetail
<ItemId>553</ItemId>
</Request>
but class XMLWriter has some peculiarities around it (see the
Child Methods section, page 595). Although the parser is
still non-validating, the DTD isn't completely ignored as it was in
version 1.x. Except for the <!ATTLIST> declaration, other DTD
declarations appear to work. Predefined, general, and parameter
entities are all supported.
Predefined entities are still supported. Additionally, any
character can be referred to by its numeric reference (for example,
@ for @). Predefined entities need not be declared in a
DTD.
General entities are macros for an XML document. They associate
parsed text with a symbol and must be declared in the DTD. For
example:
Just like general entities, parameter entities act as macros and
are declared in the DTD. However, unlike general entities, their
use is limited to the DTD - they cannot be referenced in XML. Since
NanoXML isn't a validating parser, parameter entities aren't very
useful. Perhaps this is provided as an intermediary step towards
making NanoXML a validating parser. In any case, parameter entities
are declared with the ENTITY keyword, a percent sign, a name, and
the replacement value.
Whenever the parser encounters requestParameters in the DTD, it
will substitute the quoted string. Here's a usage example:
<!ATTLIST Request %requestParameters date CDATA #IMPLIED
>
A parser that recognizes parameter entities should expand the
above to:
<!ATTLIST Request name CDATA #REQUIRED date CDATA
#IMPLIED>
Note that all parameter entities must be declared before they
are referred to in a DTD. Interestingly enough, using parameter
entities results in an XMLParseException, although they can be
declared without any problems. Perhaps this will be fixed before a
production release of NanoXML 2.0, but it's worth remembering in
future.
This package consists of four interfaces and nine
classes. The interfaces, IXMLBuilder, IXMLParser, IXMLReader,
and IXMLValidator, are all intended to allow you to plug your own
code into NanoXML. You could write your own reader, for example,
and by extending IXMLReader, it would then plug into the NanoXML
framework. You might choose to do this if your data comes from an
unconventional source, a Palm OS database for example.
We won't cover the interfaces in too much detail, as there are
concrete classes that implement
them. We'll cover those classes, StdXMLBuilder, StdXMLParser,
StdXMLReader, and NonValidator instead.
This class, even though it existed in version 1.x, has changed
significantly. Some methods have been removed, and some new ones
have been added.
Constructors
You no longer have to construct an XMLElement object unless you
are building documents. When parsing documents, the object
implementing IXMLBuilder (usually StdXMLBuilder) will provide the
root element through its getResult() method (covered below). So you
really only need to concern yourself with the following methods if
you need to build documents with NanoXML.
public XMLElement()
public XMLElement(String name)
The default constructor is provided for #PCDATA text. To support
mixed content in the XMLElement() class, #PCDATA is treated as an
XMLElement object with no element name. We'll go into this in more
detail in the Child Methods section (page 595), but this
point is very important.
Use the default constructor XMLElement() for #PCDATA. Use the
other constructors for element nodes.
The name argument represents the name of the new element.
Children Methods
These methods enable access to child elements. They are
typically used after parsing a document. Note that all elements in
a document, including #PCDATA text, are represented as XMLElement
objects. There is no concept of siblings in NanoXML as there is in
the Document Object Model (DOM). Each element is a child of the
element directly above it.
public int getChildrenCount()
public boolean isLeaf()
public boolean hasChildren()
public Enumeration enumerateChildren()
public Vector getChildren()
public XMLElement getChildAtIndex(int index)
public XMLElement getFirstChildNamed(String name)
public Vector getChildrenNamed(String name)
public Vector getChildren()
Again we see the quirkiness of NanoXML: getChildrenCount() was
called countChildren() in version 1.6.7. There is no apparent
reason for the name change except perhaps to further the
incompatibility between the two releases! Also, isLeaf()and
hasChildren() are redundant methods, providing the same
information.
The arguments name and index are the name or index of the
desired child(ren). enumerateChildren() and getChildren() existed
in version 1.6.7 and return an Enumeration or Vector of child
XMLElements.
getChildAtIndex() will throw an ArrayIndexOutOfBoundsException
if its index argument isn't valid. Likewise, getFirstChildNamed()
will return null if no such child with element name name
exists.
Usage and Examples
Here's an example that gets all elements named ItemId in an XML document fragment and outputs
each element's #PCDATA content. We'll cover the getContent() method
in the next section.
System.out.println("content is " +
elem.getContent());
}
Now let's go over the methods for adding, removing, and
accessing individual child elements.
Child Methods
public void addChild(XMLElement child)
public void removeChild(XMLElement child)
public void public void removeChildAtIndex(int index)
public void setContent(String content)
public String getContent()
Arguments
Arguments
Type
Effect
child
net.n3.nanoxml.
XMLElement
The element to add or remove to or from
the document
index
int
The index into the document where the first element is 0
name
java.lang.String
The name of the element
addChild() adds an XMLElement to the document as a child of
another element, while removeChild() and removeChildAtIndex()
remove an element from a document. The latter provides a very
simple XPath-style way of removing children.
setContent() and getContent() allow you to set the #PCDATA
content between an element. It is important to know how these
methods behave with regards to setName() and getName(), the
functions used to get/set the name of an XMLElement. Using
setContent() and getContent() incorrectly will break the XMLWriter
class, which is used for outputting documents (see Class
XMLWriter, page 601). If you create an XMLElement object with
the constructor:
public XMLElement(String name)
this creates an element with name name. The correct way to add
#PCDATA to this element is to create another XMLElement object
using the default constructor:
public XMLElement()
then calling setContent() on the returned object, and adding
that object to the first one using addChild(). If you instead call
setContent() on the object returned by the named constructor,
XMLWriter won't display subelements of that object.
To summarize, here is a code
snippet that works just fine:
root = new XMLElement("Request");
XMLElement rootPCDATA = new XMLElement();
rootPCDATA.setContent("An Auction Request");
root.addChild(rootPCDATA);
XMLElement child1 = new XMLElement("Parameters");
root.addChild(child1);
XMLWriter writer = new XMLWriter(System.out); //output the
document
writer.write(root);
The output of this snippet is:
<Request>
An Auction Request
<Parameters/>
</Request>
and here is a code snippet that
does not work fine (even though it looks like it
should):
root = new XMLElement("Request");
root.setContent("An Auction Request");
child1 = new XMLElement("Parameters");
root.addChild(child1);
writer = new XMLWriter(System.out); //output the document
writer.write(root);
The output of this snippet is:
<Request>An Auction Request</Request>
You can see that the <Parameters> element is missing.
These methods allow you to get, set, and remove attributes on an
XMLElement object. Remember that NanoXML stores attributes as a
hashtable in memory (actually it's a java.util.Properties object,
but that's derived from Hashtable). This interface is much more
intuitive than the version 1.6.7 interface.
public void getAttribute(String name)
public void getAttribute(String name, String default)
public void setAttribute(String name, String value)
public void removeAttribute(String name)
public Enumeration enumerateAttributeNames()
public boolean hasAttribute(String name)
public Properties getAttributes()
The removeAttribute() method is new for this release. If you're
familiar with version 1.6.7, you'll notice that all of the
extraneous getAttribute() methods that take different data types
(int, double, String) are now gone. All attributes are now treated
as Strings, a much simpler approach. All the method names also now
use the xxxAttribute() convention instead of the
xxxProperty() convention used in version 1.6.7. Again, this is
more intuitive as the standard XML terminology for these items is
attribute, not property.
The first two methods allow you to retrieve attributes of an
element. The first method returns null if the attribute doesn't
exist, while the second method returns default.
enumeratePropertyNames() allows you to iterate through the set of
attributes for an element, while getAttributes() returns the
internal Properties structure to you directly.
Arguments
Arguments
Type
Effect
name
String
The name of the attribute to look up
value
String
The value of the attribute
default
String
The value that is returned if the attribute doesn't exist
This class provides convenient static methods for instantiating
a parser, reader, builder, and validator all at once. These four
objects interact with each other to parse a document. The parser
object is the "glue" which contains the reader, builder, and
validator. It is represented by the IXMLParser interface (see
section Class StdXMLParser, page 600).
This class, XMLParserFactory, is not essential and could
actually be removed from the library. It would save almost one
kilobyte. The Usage and Examples section below shows how to
do this. However, typical NanoXML 2.0 code that parses XML will
start by calling one of these methods: