How to Parsing "Event XML" in Java?

I'm looking to use Java to parse an ongoing stream of event drive XML generated by a remote device. Here's a simplified sample of two events:

<?xml version="1.0"?>
<Event> DeviceEventMsg
<?xml version="1.0"?>
<Event> DeviceEventMsg

It seems like SAX is more suited to this than DOM because it is an ongoing stream, though I'm not as familiar with Sax. Don't yell at me for the structure of the XML - I know it already and can't change it.

And yes the device DOES send the xml directive before every event. My first problem is that the second xml processing instruction is croaking the SAX parser.

Can anyone suggest a way to get around that?

The code I'm using so far which is croaking on the second xml processing instruction is:

public class TestMe extends HandlerBase {
    public void startDocument () throws SAXException
        System.out.println("got startDocument");

    public void endDocument () throws SAXException
        System.out.println("got endDocument");

    public void startElement (String name, AttributeList attrs) throws SAXException
        System.out.println("got startElement");

    public void endElement (String name) throws SAXException
        System.out.println("got endElement");

    public void characters (char buf [], int offset, int len) throws SAXException
        System.out.println("found characters");

    public void processingInstruction (String target, String data) throws SAXException
        System.out.println("got processingInstruction");

    public static void main(String[] args) {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        try {
            SAXParser saxParser = factory.newSAXParser();
            // using a file as test input for now
            saxParser.parse( new File("devmodule.xml"), new TestMe() );

        } catch (Throwable err) {
            err.printStackTrace ();

Asked by: Lily324 | Posted: 28-01-2022

Answer 1

Try to use StAX instead of SAX. StAX allows much more flexibility and it is a better solution for streaming XML. There are few implementations of StAX, I am very happy with the codehaus one, but there is also one from Sun. It might solve you're problems.

Answered by: Grace966 | Posted: 01-03-2022

Answer 2

One more suggestion, specifically regarding multiple xml declarations. Yes, this is ILLEGAL xml, so proper parsers will barf on it using default modes. But some parsers have alternate "multi-document" modes. For example, Woodstox has this, so you can check out:

Basically, you have to tell parser (via input factory) that input is in form of "multiple xml documents" (ParsingMode.PARSING_MODE_DOCUMENTS).

If so, it will accept multiple xml declarations, each one indicating start of a new document.

Answered by: Catherine256 | Posted: 01-03-2022

Answer 3

If you print out the name for the start and end element System.out.println() you will get something like this:

got startDocument got startElement Event found characters found characters got startElement Param1 found characters got endElement Param1 found characters got endElement Event org.xml.sax.SAXParseException: The processing instruction target matching "[xX][mM][lL]" is not allowed. ...

So I think the second

<?xml version="1.0"?>

without getting an endDocument is causing a parser problem.

Answered by: Miranda160 | Posted: 01-03-2022

Answer 4

If you add this:

catch(SAXException SaxErr){
        System.out.println("ignore this error");

before the other catch you will catch this particular error. you would then have to reopen the device or for the static file case you may have to keep track of were you are in the file.

Or at the end Event event, close the device/File and then reopen it for the next event.

Answered by: Daryl721 | Posted: 01-03-2022

Answer 5

RE: Simon's suggestion of catching the SAXException to determine when you've come to the end of one XML document and reached the start of another, I think this would be a problematic approach. If another error occurred(for whatever reason), you wouldn't be able to tell whether the exception had been thrown due to erroneous XML or because you'd reached the end of a document.

The problem is that the parser is for processing an XML document; not a stream of several XML documents. I would suggest writing some code to manually parse the incoming data stream, breaking it into individual streams containing a single XML document; and then pass these streams to the XML parser in serial (so guaranteeing the order of your events).

Answered by: Carlos967 | Posted: 01-03-2022

