Java XML – Example JDOM2 Usage

Building JDOM2 Document

In this tutorial we look at an example of how to build and navigate through a JDOM2 document from an XML source (in this case, a BBC News “Technology” RSS feed). We first use the org.jdom2.input.SAXBuilder class to create the JDOM2 document from the source (more details and options on the SAXBuilder class will be covered in later tutorials). Once we obtain the JDOM2 Document there are various JDOM2 methods used to access the elements, which will be covered in the next section of this tutorial.

Accessing JDOM2 Components

JDOM2 is a java representation of an XML document. In other words, each XML component is represent as a java object. JDOM2 has convenient methods to access the various components. Here are some of the key use cases (the number in the bracket corresponds to the line number in the example) :

  • Obtaining the root element (25,28) – Root Element is the topmost Element of the document
  • Obtaining the Namespace added by a node (32)
  • Obtaining the contents of a particular node (43) – The node is traveresed recursively till all content is found
  • Obtaining a child of an Element (50)
  • Obtaining all children of an Element(54)
  • Obtaining the first child of a particular type and name e.g. the first text node with name ‘link’ (62)
  • Obtaining the first child of a particular type and name and from a specified namespace e.g. the first text node with name ‘link’ from namespace ‘atom'(70)
  • Obtaining all children with a specific name (75)
  • Iterating through all the descendants of an Element and obtaining Elements with specific name and from a specific Namespace (85)

This is not an exhaustive list of use cases but it gives an idea of what can be achieved. If you have a specific requirement then do post it as a comment and we would be glad to answer it

package com.studytrails.xml.jdom;
 
import java.io.IOException;
import java.util.List;
 
import org.jdom2.Content;
import org.jdom2.Document;
import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;
import org.jdom2.util.IteratorIterable;
 
public class CreateJdomFromSax {
 
    private static String xmlSource = "http://feeds.bbci.co.uk/news/technology/rss.xml?edition=int";
 
    public static void main(String[] args) throws JDOMException, IOException {
        // the SAXBuilder is the easiest way to create the JDOM2 objects.
        SAXBuilder jdomBuilder = new SAXBuilder();
 
        // jdomDocument is the JDOM2 Object
        Document jdomDocument = jdomBuilder.build(xmlSource);
 
        // The root element is the root of the document. we print its name
        System.out.println(jdomDocument.getRootElement().getName()); // prints
                                                                        // "rss"
 
        Element rss = jdomDocument.getRootElement();
 
        // The Element class extends Content class which is NamespaceAware. We
        // see what namespace this element introduces.
        System.out.println(rss.getNamespacesIntroduced());
        /*
         * prints [[Namespace: prefix "atom" is mapped to URI
         * "http://www.w3.org/2005/Atom"], [Namespace: prefix "media" is mapped
         * to URI "http://search.yahoo.com/mrss/"]]
         */
 
        // the getContent method traverses through the document and gets all the
        // contents. We print the CType (an enumeration identifying the Content
        // Type), value and class of the Content. we print only the
        // first two values, since this is only an example.
        List rssContents = rss.getContent();
        for (int i = 0; i < 2; i++) {
            Content content = rssContents.get(i);
            System.out.println("CType " + content.getCType());
            System.out.println("Class " + content.getClass());
        }
 
        Element channel = rss.getChild("channel");
 
        // the getChildren method can be used to obtain the children of the
        // element
        List channelChildren = channel.getChildren();
        for (int i = 0; i < 2; i++) {
            Element channelChild = channelChildren.get(i);
            System.out.println(channelChild.getName());// prints 'title' and
                                                        // 'link'
        }
 
        // to directly obtain the child node of type Text
        System.out.println(channel.getChildText("link")); // print the first
                                                            // link
 
        // It is also possible to specify the namespace while obtaining the
        // child element. In the statement below we
        // obtain the child with name 'link' but we want that child to be from
        // the atom namespace. We further use the getAttributeValue method to
        // get the value of the attribute of the node
        System.out.println(channel.getChild("link", rss.getNamespace("atom")).getAttributeValue("href"));
        // prints http://feeds.bbci.co.uk/news/technology/rss.xml
 
        // Instead of getting all the children of a node we may want to get all
        // children with a particular name.
        List items = channel.getChildren("item");
        for (int i = 0; i < 2; i++) {
            System.out.println(items.get(i).getChildText("title")); // prints
                                                                    // the first
                                                                    // two
                                                                    // titles
        }
 
        // iterate through all the descendants and get the url of the thumbnails
        // (The thumbnails are declared with namespace media)
        IteratorIterable descendantsOfChannel = channel.getDescendants();
        for (Content descendant : descendantsOfChannel) {
            if (descendant.getCType().equals(Content.CType.Element)) {
                Element element = (Element) descendant;
                if (element.getNamespace().equals(rss.getNamespace("media"))) {
                     System.out.println(element.getAttributeValue("url")); //
                    // prints all urls of all thumbnails within the
                    // 'media' namespace
                }
            }
        }
    }
}


Leave a Comment