How to Parsing "Event XML" in Java?

I'm looking to use Java to parse an ongoing stream of event drive XML generated by a remote device. Here's a simplified sample of two events:

<?xml version="1.0"?>
<Event> DeviceEventMsg
<?xml version="1.0"?>
<Event> DeviceEventMsg

It seems like SAX is more suited to this than DOM because it is an ongoing stream, though I'm not as familiar with Sax. Don't yell at me for the structure of the XML - I know it already and can't change it.

And yes the device DOES send the xml directive before every event. My first problem is that the second xml processing instruction is croaking the SAX parser.

Can anyone suggest a way to get around that?

The code I'm using so far which is croaking on the second xml processing instruction is:

public class TestMe extends HandlerBase {
    public void startDocument () throws SAXException
        System.out.println("got startDocument");

    public void endDocument () throws SAXException
        System.out.println("got endDocument");

    public void startElement (String name, AttributeList attrs) throws SAXException
        System.out.println("got startElement");

    public void endElement (String name) throws SAXException
        System.out.println("got endElement");

    public void characters (char buf [], int offset, int len) throws SAXException
        System.out.println("found characters");

    public void processingInstruction (String target, String data) throws SAXException
        System.out.println("got processingInstruction");

    public static void main(String[] args) {
        SAXParserFactory factory = SAXParserFactory.newInstance();
        try {
            SAXParser saxParser = factory.newSAXParser();
            // using a file as test input for now
            saxParser.parse( new File("devmodule.xml"), new TestMe() );

        } catch (Throwable err) {
            err.printStackTrace ();

Asked by: Lily324 | Posted: 28-01-2022

Answer 1

Try to use StAX instead of SAX. StAX allows much more flexibility and it is a better solution for streaming XML. There are few implementations of StAX, I am very happy with the codehaus one, but there is also one from Sun. It might solve you're problems.

Answered by: Grace966 | Posted: 01-03-2022

Answer 2

One more suggestion, specifically regarding multiple xml declarations. Yes, this is ILLEGAL xml, so proper parsers will barf on it using default modes. But some parsers have alternate "multi-document" modes. For example, Woodstox has this, so you can check out:

Basically, you have to tell parser (via input factory) that input is in form of "multiple xml documents" (ParsingMode.PARSING_MODE_DOCUMENTS).

If so, it will accept multiple xml declarations, each one indicating start of a new document.

Answered by: Catherine256 | Posted: 01-03-2022

Answer 3

If you print out the name for the start and end element System.out.println() you will get something like this:

got startDocument got startElement Event found characters found characters got startElement Param1 found characters got endElement Param1 found characters got endElement Event org.xml.sax.SAXParseException: The processing instruction target matching "[xX][mM][lL]" is not allowed. ...

So I think the second

<?xml version="1.0"?>

without getting an endDocument is causing a parser problem.

Answered by: Miranda160 | Posted: 01-03-2022

Answer 4

If you add this:

catch(SAXException SaxErr){
        System.out.println("ignore this error");

before the other catch you will catch this particular error. you would then have to reopen the device or for the static file case you may have to keep track of were you are in the file.

Or at the end Event event, close the device/File and then reopen it for the next event.

Answered by: Daryl721 | Posted: 01-03-2022

Answer 5

RE: Simon's suggestion of catching the SAXException to determine when you've come to the end of one XML document and reached the start of another, I think this would be a problematic approach. If another error occurred(for whatever reason), you wouldn't be able to tell whether the exception had been thrown due to erroneous XML or because you'd reached the end of a document.

The problem is that the parser is for processing an XML document; not a stream of several XML documents. I would suggest writing some code to manually parse the incoming data stream, breaking it into individual streams containing a single XML document; and then pass these streams to the XML parser in serial (so guaranteeing the order of your events).

Answered by: Carlos967 | Posted: 01-03-2022

Similar questions

Java HTML Parsing

Parsing XML with REGEX in Java

Given the below XML snippet I need to get a list of name/value pairs for each child under DataElements. XPath or an XML parser cannot be used for reasons beyond my control so I am using regex. &lt;?xml version="1.0"?&gt; &lt;StandardDataObject xmlns="myns"&gt; &lt;DataElements&gt; &lt;EmpStatus&gt;2.0&lt;/EmpStatus&gt; &lt;Expenditure&gt;95465.00&lt;/Expenditure&gt; &lt;StaffType&gt;11.A&lt;/S...

Parsing XML with XPath in Java

This question already has answers here:

Java XML parsing

Whats the quickest way to convert a doc like: &lt;customermodel:Customer&gt; &lt;creditCards&gt; &lt;cardNumber&gt;@0&lt;/cardNumber&gt; &lt;provider&gt;@HSBC&lt;/provider&gt; &lt;xsi:type&gt;@customermodel:CreditCard&lt;/xsi:type&gt; 23242552 &lt;/creditCards&gt; . . So that the elements with @ become attributes for the parent element.

parsing - Java BBCode library

Closed. This question does not meet Stack Overflow guid...

parsing - Java postal address parser

Closed. This question does not meet Stack Overflow guid...

Text File Parsing in Java

I am reading in a text file using FileInputStream that puts the file contents into a byte array. I then convert the byte array into a String using new String(byte). Once I have the string I'm using String.split("\n") to split the file into a String array and then taking that string array and parsing it by doing a String.split(",") and hold the contents in an Arraylist.

java - how to get a the value of an http post as a whole? parsing restful post

Is it my ideea or in rest-web services a post comes "with no name", so say something... I mean, is the post the whole body, minus headers??? so, how can I parse such a post message with java? do I have to use HttpServletRequest.getInputStream?

xml - StAX parsing from Java NIO channel

I am attempting to receive a stream of XML events over a Java NIO channel. I am new to both NIO and StAX parsing, so I could very easily be overlooking something :) My search has led me to several SAX and StAX implementations, but they all seem to operate on InputStreams and InputSources--not NIO channels. The two closest attempts I have made have been to get the InputStream from the channel and create a PipedInput...

parsing - Read and parse KML in java

Is there any library available to parse KML ?

Still can't find your answer? Check out these amazing Java communities for help...

Java Reddit Community | Java Help Reddit Community | Java Community | Java Discord | Java Programmers (Facebook) | Java developers (Facebook)