Introduction to Asynchronous StAX Parsing Using Aalto

Overview

Usually StAX parsing is synchronous in nature. By that we mean that the
InputStream
is blocked by the StAX parser until such time it reads the necessary data. This is typically known as the ‘Pull’ model where-in the StAX parser pulls the data on an as-need basis. In fact, it is this ‘Pull’ model of StAX that allows it to parse large sized XML documents efficiently.

In addition to providing synchronous StAX parsing, Aalto provides support for asynchronous parsing using StAX APIs. This is a unique feature of Aalto as compared to other StAX parsers. In asynchronous mode, the
InputStream
is
not
blocked by the parser during its parsing operation. It is the responsibility of the calling code to chunk the data from
InputStream
and then pass to the Aalto’s asynchronous parser. In this way, Aalto asynchronous parser works in ‘Push’ model where-in data is pushed by the calling code for parsing.

The following is a diagrammatic reprentation of Async Mode:

Note that Aalto does not use a separate API for asynchronous mode. It re-uses the StAX API abstractions.

A key benefit of using Aalto’s asynchronous parser is that multiple input streams can be used to provide data for parsing XML document. This can significantly enhance scalability in certain situations.

On the downside, as of date, the coalescing mode is not implemented in asynchronous mode.

The following table details out the difference between Synchronous and Asynchronous mode:

Synchronous Mode Asynchronous Mode
Blocks InputStream Does not block InputStream
Pull Model Push Model
Works on one InputStream Can works on multiple InputStreams
Less scalable More scalable
Comparatively simpler client code logic Slightly more complex logic of client code
Not so fine grained control over memory usage Better control over memory usage

For further details on asynchronous mode, see
this blog
by the author of Aalto.

Leave a Comment