In this next example, I'm going to write a program that will parse
and display an entire document, indenting each element, processing
instruction, and so on, as well as displaying attributes and their
values. For example, if you pass customer.xml to this program, which
I'll call IndentingParser.java, that program will display the whole
document properly indented.
I start by letting the user specify what document to parse and
then parsing that document as before. To actually parse the document,
I'll call a new method, displayDocument, from the main method:
public static void main(String args[])
{
displayDocument(args[0]);
.
.
.
}
In the displayDocument method, I'll parse the document and get an
object corresponding to that document:
import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;
public class IndentingParser
{
public static void displayDocument(String
uri)
{
try {
DOMParser parser = new DOMParser();
parser.parse(uri);
Document document = parser.getDocument();
.
.
.
} catch (Exception e)
{
e.printStackTrace(System.err);
}
.
.
.
The actual method that will parse the document, display, will be
recursive, as we saw when working with JavaScript. I'll pass the
document to parse to that method, as well as the current indentation
string (which will grow by four spaces for every successive level of
recursion):
import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;
public class IndentingParser
{
public static void displayDocument(String
uri)
{
try {
DOMParser parser = new DOMParser();
parser.parse(uri);
Document document = parser.getDocument();
display(document, "");
} catch (Exception e)
{
e.printStackTrace(System.err);
}
}
.
.
.
In the display method, I'll check to see whether the node passed
to us is really a node-if not, return from the method. The next job
is to display the node, and how we do that depends on the type of
node we're working with. To get the type of node, you can use the
node's getNodeType method; I'll set up a long switch statement to
handle the different types:
import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;
public class IndentingParser
{
public static void displayDocument(String
uri)
{
.
.
.
}
public static void display(Node node, String
indent)
{
if (node == null) {
return;
}
int type =
node.getNodeType();
switch (type) {
.
.
.
To handle output from this program, I'll create an array of
strings, displayStrings, placing each line of the output into one of
those strings. I'll also store our current location in that array in
an integer named numberDisplayLines:
public class IndentingParser
{
static String displayStrings[] = new
String[1000];
static int numberDisplayLines = 0;
.
.
.
I'll start handling various types of nodes in this switch
statement now.
Handling Document Nodes
At the beginning of the document is the XML declaration, and the
type of this node matches the constant Node.DOCUMENT_NODE defined in
the Node interface (see Table 11.4). This declaration takes up one
line of output, so I'll start the first line of output with the
current indent string, followed by a default XML declaration.
The next step is to get the document element of the document we're
parsing (the root element), and you do that with the
getDocumentElement method. The root element contains all other
elements, so I pass that element to the display method, which will
display all those elements:
public static void display(Node node, String indent)
{
if (node == null) {
return;
}
int type = node.getNodeType();
switch (type) {
case
Node.DOCUMENT_NODE: {
displayStrings[numberDisplayLines] = indent;
displayStrings[numberDisplayLines] +=
"<?xml version=\"1.0\" encoding=\""+
"UTF-8" + "\"?>";
numberDisplayLines++;
display(((Document)node).getDocumentElement(), "");
break;
}
.
.
.
Handling Element Nodes
To handle an element node, we should display the name of the
element, as well as any attributes the element has. I start by
checking whether the current node type is Node.ELEMENT_NODE; if so, I
place the current indent string into a display string, followed by a
< and the element's name, which I can get with the getNodeName
method:
switch (type) {
.
.
.
case Node.ELEMENT_NODE: {
displayStrings[numberDisplayLines] = indent;
displayStrings[numberDisplayLines] += "<";
displayStrings[numberDisplayLines] += node.getNodeName();
.
.
.
Handling Attributes
Now we've got to handle the attributes of this element, if it has
any. Because the current node is an element node, you can use the
method getAttributes to get a NodeList object holding all its
attributes, which are stored as Attr objects. I'll convert the node
list to an array of Attr objects, attributes, like this-note that I
first create the attributes array after finding the number of items
in the NodeList object with the getLength method:
switch (type) {
.
.
.
case Node.ELEMENT_NODE: {
displayStrings[numberDisplayLines] = indent;
displayStrings[numberDisplayLines] += "<";
displayStrings[numberDisplayLines] += node.getNodeName();
int length =
(node.getAttributes() != null) ?
node.getAttributes().getLength() : 0;
Attr attributes[]
= new Attr[length];
for (int
loopIndex = 0; loopIndex < length; loopIndex++) {
attributes[loopIndex] =
(Attr)node.getAttributes().item(loopIndex);
}
.
.
.
You can find the methods of the Attr interface in Table 11.6.
Attr Interface Methods
Method
Description
java.lang.String
getName()
Gets the name of this attribute
Element
getOwnerElement()
‑Gets the Element node to which this attribute is attached
boolean
getSpecified()
‑Is true if this attribute was explicitly given a value in the
original document.
java.lang.String
getValue()
Gets the value of the attribute as a string
Because the Attr interface is built on the Node interface, you can
use either the getNodeName and getNodeValue methods to get the
attribute's name and value, or the Attr methods getName and getValue
methods. I'll use getNodeName and getNodeValue here. In this case,
I'm going to loop over all the attributes in the attributes array,
adding them to the current display line: AttrName = "AttrValue".
(Note that I escape the quotation marks around the attribute values
as \" so that Java doesn't interpret them as the end of the
string.)
switch (type) {
.
.
.
case Node.ELEMENT_NODE: {
displayStrings[numberDisplayLines] = indent;
displayStrings[numberDisplayLines] += "<";
displayStrings[numberDisplayLines] += node.getNodeName();
int length =
(node.getAttributes() != null) ?
node.getAttributes().getLength() : 0;
Attr attributes[]
= new Attr[length];
for (int
loopIndex = 0; loopIndex < length; loopIndex++) {
attributes[loopIndex] =
(Attr)node.getAttributes().item(loopIndex);
}
for (int
loopIndex = 0; loopIndex < attributes.length; loopIndex++) {
Attr attribute = attributes[loopIndex];
displayStrings[numberDisplayLines] += " ";
displayStrings[numberDisplayLines] += attribute.getNodeName();
displayStrings[numberDisplayLines] += "=\"";
displayStrings[numberDisplayLines] += attribute.getNodeValue();
displayStrings[numberDisplayLines] += "\"";
}
displayStrings[numberDisplayLines] += ">";
numberDisplayLines++;
.
.
.
This element may have child elements, of course, and we have to
handle them as well. I do that by storing all the child nodes in a
NodeList object with the getChildNodes method. If there are any child
nodes, I add four spaces to the indent string and loop over those
child nodes, calling display to display each of them:
switch (type) {
.
.
.
case Node.ELEMENT_NODE: {
displayStrings[numberDisplayLines] = indent;
displayStrings[numberDisplayLines] += "<";
displayStrings[numberDisplayLines] += node.getNodeName();
int length =
(node.getAttributes() != null) ?
node.getAttributes().getLength() : 0;
Attr attributes[]
= new Attr[length];
for (int
loopIndex = 0; loopIndex < length; loopIndex++) {
attributes[loopIndex] =
(Attr)node.getAttributes().item(loopIndex);
}
for (int
loopIndex = 0; loopIndex < attributes.length; loopIndex++) {
Attr attribute = attributes[loopIndex];
displayStrings[numberDisplayLines] += " ";
displayStrings[numberDisplayLines] += attribute.getNodeName();
displayStrings[numberDisplayLines] += "=\"";
displayStrings[numberDisplayLines] += attribute.getNodeValue();
displayStrings[numberDisplayLines] += "\"";
}
displayStrings[numberDisplayLines] += ">";
numberDisplayLines++;
NodeList
childNodes = node.getChildNodes();
if (childNodes !=
null) {
length = childNodes.getLength();
indent += " ";
for (int loopIndex = 0; loopIndex < length; loopIndex++ ) {
display(childNodes.item(loopIndex), indent);
}
}
break;
}
.
.
.
That's it for handling elements; I'll handle CDATA sections
next.
Handling CDATA Section Nodes
Handling CDATA sections is particularly easy. All I have to do
here is to enclose the value of the CDATA section's node inside
"<![CDATA[" and "[[>":
case Node.CDATA_SECTION_NODE: {
displayStrings[numberDisplayLines] =
indent;
displayStrings[numberDisplayLines] +=
"<![CDATA[";
displayStrings[numberDisplayLines] +=
node.getNodeValue();
displayStrings[numberDisplayLines] +=
"]]>";
numberDisplayLines++;
break;
}
.
.
.
Handling Text Nodes
The W3C DOM specifies that the text in elements must be stored in
text nodes, and those nodes have the type Node.TEXT_NODE. For these
nodes, I'll add the current indent string to the display string, and
then I'll trim off leading and trailing whitespace from the node's
value with the Java String object's trim method:
case Node.TEXT_NODE: {
displayStrings[numberDisplayLines] =
indent;
String newText =
node.getNodeValue().trim();
.
.
.
The XML for Java parser treats all text as text nodes, including
the spaces used for indenting elements in customer.xml. I'll filter
out the text nodes corresponding to indentation spacing; if a text
node contains only displayable text, however, I'll add that text to
the strings in the displayStrings array:
case Node.TEXT_NODE: {
displayStrings[numberDisplayLines] =
indent;
String newText =
node.getNodeValue().trim();
if(newText.indexOf("\n") < 0 &&
newText.length() > 0) {
displayStrings[numberDisplayLines] += newText;
numberDisplayLines++;
}
break;
}
.
.
.
Handling Processing Instruction Nodes
The W3C DOM also lets you handle processing instructions. Here,
the node type is Node.PROCESSING_INSTRUCTION_NODE, and the node value
is simply the processing instruction itself. For example, let's say
that this is the processing instruction:
<?xml-stylesheet type="text/css" href="style.css"?>
Then this is the value of the associated processing instruction
node:
xml-stylesheet type="text/css" href="style.css"
That means all we have to do is to straddle the value of a
processing instruction node with <? and ?>. Here's what the
code looks like:
case Node.PROCESSING_INSTRUCTION_NODE:
{
displayStrings[numberDisplayLines] = indent;
displayStrings[numberDisplayLines] += "<?";
String text =
node.getNodeValue();
if (text != null
&& text.length() > 0) {
displayStrings[numberDisplayLines] += text;
}
displayStrings[numberDisplayLines] += "?>";
numberDisplayLines++;
break;
}
}
.
.
.
And that finishes the switch statement that handles the various
types of nodes. There's only one more point to cover.
Closing Element Tags
Displaying element nodes takes a little more thought than
displaying other types of nodes. In addition to displaying <, the
name of the element, and >, you also must display a closing tag,
</, the name of the element, and >, at the end of the
element.
For that reason, I'll place some code after the switch statement
to add closing tags to elements after all their children have been
displayed. (Note that I'm also subtracting four spaces from the
indent string, using the Java String substr method so that the
closing tag lines up vertically with the opening tag.)
if (type == Node.ELEMENT_NODE) {
displayStrings[numberDisplayLines] = indent.substring(0,
indent.length() - 4);
displayStrings[numberDisplayLines] += "</";
displayStrings[numberDisplayLines] += node.getNodeName();
displayStrings[numberDisplayLines] += ">";
numberDisplayLines++;
indent +=
" ";
}
}
And that's it. I parse and display customer.xml like this after
compiling IndentingParser.java-in this case, I'll pipe the output
through the more filter to stop it scrolling off the screen. (The
more filter is available in MS-DOS and certain UNIX ports; it
displays one screenful of information, and waits for you to type a
key to display the next screenful.)