Java API for XML (JAXP) – SAX

What is SAX

SAX stands for Simple API for XML. It is an event driven method of accessing elements of an XML document. The elements are accessed serially. The API fire events for each type of data that it finds. The user provides a Handler that can handle the various kinds of events thrown by the parser. The user also needs to provide an error Handler. SAX uses lesser memory than DOM since unlike DOM it does not load the entire document tree into memory.

Important Classes

  • javax.xml.parsers.SAXParserFactory – This is a factory for configuring and obtaining a SAX Parser.
  • javax.xml.parsers.SAXParser – This is an API that wraps a org.xml.sax.XMLReader. The class contains the parse methods that take in an input stream and a Handler for the SAX events.
  • org.xml.sax.XMLReader – This interface defines methods that reads an XML document and provides events that can be acted upon. The SAX parser implements this interface. The interface allows configuring features for the parsers. The API allows setting the DTDHandler, EntityResolver, ContentHandler and ErroHandler
  • org.xml.sax.DTDHandler – This handler receives notification for DTD related events.
  • org.xml.sax.EntityResolver – Resolves entities.
  • org.xml.sax.ErrorHandlerr – Receives notification for warning, error and fatalError encountered during parsing.
  • org.xml.sax.ContentHandler – This receives notification for the various components of the XML. The clients would almost always provide implementation for this interface (or extend the DefaultHandler). The order of events depend on the order of components in the XML Document. The main events are :
    • startDocument-Event thrown during the start of a document parsing
    • endDocument-Event thrown during the End of a document parsing
    • startElement-Event thrown during the start of an Element
    • endElement-Event thrown during the end of an Element
    • characters (char ch[], int start, int l)-Event for Characters. Note that the parser may not return all characters within a particular Text node. The client should read ‘l’ elements from the ‘start’ index.
    • ProcessingInstruction-Event thrown when a Processing Instruction is encountered.

Example

Let us now look at an example. Instead of implementing the four interfaces described above (DTDHandler, EntityResolver,ContenHandler and ErrorHandler) the client would mostly extend the org.xml.sax.helpers.DefaultHandler which implements all the four interafaces.

package com.studytrails.xml.jaxp;

import java.io.IOException;
import java.util.Arrays;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;

public class JaxpSAXExample2 {

	private static String xmlSource = "http://feeds.bbci.co.uk/news/technology/rss.xml?edition=int";

	public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
		JaxpSAXExample2 example = new JaxpSAXExample2();
		example.startParsing();

	}

	void startParsing() throws ParserConfigurationException, SAXException, IOException {
		// create the factory that will create hte parser
		SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
		saxParserFactory.setNamespaceAware(true);
		SAXParser parser = saxParserFactory.newSAXParser();
		// JAXP provides a default xerces SAX Parser
		System.out.println(parser.getClass());
		// prints class com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl
		// which is the default SAXParser in JAXP.

		XMLReader reader = parser.getXMLReader();
		System.out.println(reader.getClass());

		// MyDefaultHandler extends the DefaultHandler. It contains the business
		// logic that handles the various XML component related events.
		parser.parse(xmlSource, new MyDefaultHandler());

		System.out.println(parser.isNamespaceAware());
		// prints false. The parser is not namespace aware by default

		System.out.println(parser.isValidating());
		// prints false. The parser, by default, does not validate.
	}

	class MyDefaultHandler extends DefaultHandler {
		boolean parsingTitle = false;

		@Override
		public void startDocument() throws SAXException {
			System.out.println("Start parsing the document");
		}

		@Override
		public void startPrefixMapping(String prefix, String uri) throws SAXException {
			System.out.println("start::" + prefix);
			System.out.println(uri);
		}

		@Override
		public void endPrefixMapping(String prefix) throws SAXException {
			System.out.println("end::" + prefix);
		}

		@Override
		public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
			if ("title".equals(qName))
				parsingTitle = true;
			else
				parsingTitle = false;
		}

		@Override
		public void characters(char[] ch, int start, int length) throws SAXException {
			if (parsingTitle) {
				System.out.println(Arrays.copyOfRange(ch, start, start + length));
				// System.out.println(start);
				// System.out.println(length);
			}
		}

		@Override
		public void endElement(String uri, String localName, String qName) throws SAXException {
			if ("title".equals(qName))
				parsingTitle = false;
		}

	}
}

Leave a Comment