OpenEXI


Back to OpenEXI Tutorial
Back to OpenEXI Home Page


Example B - Working with XML Fragments

This example shows how to work with XML fragments rather than well-formed XML documents.

What Example B Demonstrates

The W3C XML Specification explicitly states that a well-formed XML document must have exactly one root element. For some use cases, it can be more efficient to transfer a subset of the well-formed XML document, creating a file that has multiple elements at the root level. For example, a catalog application might be updated on a regular basis with current inventory. The catalog data would change infrequently (item name, description, cost, etc.), while the inventory might be updated daily or even hourly. To reduce the size of the transfer file and maximize performance, an application could send only the current inventory numbers in EXI format. The inventory numbers can be parsed and merged with existing catalog information by an application on the receiving end.

This example demonstrates how you can create a filter and input stream that will accept and process XML fragments into EXI format for use in your custom applications.

Note: This example demonstrates a workaround that enables you to encode and decode XML fragments. A later release of the Transmogrifier will incorporate the strategy used by FragmentInputStream and FragmentFilter using similar arguments. Also note that this example works with ASCII-compatible encoding (in this example, UTF-8), but not with UTF-16 encoding.

How to Use Example B

    To install and run Example B:
  1. If you haven't already done so, download and expand a local copy of OpenEXISupportingJARs.zip. This archive contains three JARs, nagasena.jar, xercesImpl.jar and xml-apis.jar.
  2. Download and expand OpenEXI_Example6.zip. This archive contains the compiled example classes, Java source code, a batch file for running the application in a Windows environment, and an empty /lib directory.
  3. Copy nagasena.jar and xerxesImpl.jar to the /lib directory.
  4. From the root directory, enter the command

    java openexi.sample.OpenEXISampleApplication.

    Note: For convenience, Windows users can use the included batch file, RunOpenEXISample.bat, to set the classpath and launch the sample application.
    To encode an XML fragment to EXI:
  1. Click Browse... to the right of the Source File field to select an XML file to encode. The selected file name appears in the Source File field. A suggested name is displayed in the Destination File field, but you can edit the location or file name according to your needs.
  2. Select the File is a Fragment checkbox.
  3. Click Encode.
    To decode an EXI file to XML:
  1. Click Browse... to the right of the Source File field select an EXI file to decode. The selected file name appears in the Source File field. A suggested name is displayed in the Destination File field, but you can edit the location or file name according to your needs.
  2. Choose the same settings used to encode the file, including File is a Fragment.
  3. Click Decode.

Code Highlights

Complete, commented source code is included in the src directory in OpenEXI_Example6.zip. This section highlights the important updates in each iteration as the examples build on one another.

FragmentInputStream

The SAX XMLReader will throw an error if an XML file has more than one root element. To work around that limitation, the FragmentInputStream appends temporary <root> tags to the start and end of an XML fragment to create a stream that the XMLReader will accept.

package openexi.sample;

import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;

public class FragmentInputStream extends InputStream{

Global byte arrays store the strings for the document starting and ending elements.

    static byte[] pre;
    static byte[] post;

Global variables are used to monitor progress as the file is read.

  
    boolean done = false;
    int pos = 0;
    int len = 0;
    final byte[] buffer;

This class accepts any type of InputStream, most often a FileInputStream.

    final InputStream in;

Populate the start pre and post variables with byte arrays converted from strings.

   
    static
    {
        try
        {
            pre = "<root>".getBytes("UTF-8");
            post = "</root>".getBytes("UTF-8");
        }
        catch (UnsupportedEncodingException e)
        {
            throw new RuntimeException(e);
        }
    }

The constructor creates a buffer, inserts the <root> tag at the beginning, and assigns the input stream to the global variable in.

    
    public  FragmentInputStream(InputStream in)
    {
        this.buffer = new byte[1024];
        System.arraycopy(pre, 0, buffer, 0, pre.length);
        len = pre.length;
        this.in = in;
    }

Override both of the read() methods to check if the end of file has been reached. If not, call the fill() method to accept the next byte stream.

    @Override
    public int read() throws IOException
    {
        if (pos == len)
        {
                // Check whether the post tag has been appended to the file.
                if (done)
                        return -1;
                // If not, fill the buffer again.
                fill();
        }
        return buffer[pos++] & 0xFF;
    }
    
    @Override
    public int read(byte[] buf, int offset, int bufLen) throws IOException
    {
        if (pos == len)
        {
                // Check whether the post tag has been appended to the file.
                if (done)
                        return -1;
                // If not, fill the buffer again.
                fill();
        }

Figure the current byte in the InputStream (currentByte), the lesser of the buffer length or the end length of the array minus the current position.

        int currentByte = Math.min(len - pos, bufLen);

Append the byte array to the global byte array.

        System.arraycopy(buffer, pos, buf, offset, currentByte);

Increment the current position by the current byte count, and return the currentByte value.

        pos += currentByte;
        return currentByte;
    }

The fill() method is where most of the work is done, reading 1024 bytes into the buffer on each pass.

    protected void fill() throws IOException
    {
        pos = 0;
        len = in.read(buffer, 0, buffer.length);

If the length of the byte stream is 0, the end has been reached, and len = -1. The method appends the post variable ("</root>") to the stream, and sets the done variable to true.

  
        if (len < 0)
        {
            System.arraycopy(post, 0, buffer, 0, post.length);
            len = post.length;
            done = true;
        }
    }
}

FragmentFilter

The FragmentInputStream class adds <root> tags to fool the XMLReader into parsing a fragment file as if it were a well-formed XML document with a single root element. These tags need to be removed as the fragment file is encoded to EXI.

The FragmentFilter class captures startElement and endElement events. By incrementing the m_depthCounter for every startElement and decrementing it for every endElement, the methods can identify when the event is at the root level (-1), swallow the event, and continue. Constructors, child elements and all other events are handled normally by the superclass.

package openexi.sample;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLFilterImpl;

public class FragmentFilter extends XMLFilterImpl {
    int m_depthCounter = -1;

    public FragmentFilter(XMLReader xmlReader) {
        super(xmlReader);
    }

    public FragmentFilter() {
        super();
    }

Override the startElement method. If the m_depthCounter is 0 or higher, pass the event to the superclass. Otherwise, swallow the event and move on. This prevents the <root> element from being included in the output stream.

    
    public void startElement(String uri, String localName, String qname, Attributes atts) {

        if (m_depthCounter > -1)
        {
            try {
                super.startElement(uri,localName,qname,atts);
            } catch (SAXException e) {
                e.printStackTrace();
            }     
        }

Increment the depth counter after processing each starting element. If the next event is a startElement, the depth counter will increase to the next level of nesting.

        m_depthCounter++;
    }
    public void endElement(String uri, String localName, String qname) {

Decrement the depth counter before processing each ending element. If the next event is also an endElement, the depth counter will decrease one level of nesting. If the nesting level is less than 0, swallow the event so that the </root> tag is not included in the output stream.

        --m_depthCounter;
        if (m_depthCounter >-1)
        {
            try {
                super.endElement(uri, localName, qname);
            } catch (SAXException e) {
                e.printStackTrace();
            }
        }
    }
}

EncodeEXI

The method signature for encodeEXI is expanded once again with a single Boolean value to indicate when an input file is an XML fragment rather than a well-formed XML document.

    public void encodeEXI(
        String sourceFile, 
        String destinationFile,
        String alignment,
//Preservation options
        Boolean preserveComments,
        Boolean preservePIs,
        Boolean preserveDTD,
        Boolean preserveNamespace,
        Boolean preserveLexicalValues,
        Boolean preserveWhitespace,
        int blockSize,
        int maxValueLength,
        int maxValuePartitions,
// Schema options
        String schemaFileName,
        String exiSchemaFileName,
        Boolean strict,
        String useSchema,
// Datatype Representation Map Options
        Boolean useDTRM,
        String datatypeRepresentationMap,
// Fragment
        Boolean fragment
    )

EncodeEXI requires a few extra lines of code during step 6, encoding the input stream, in order to allow for file fragments. If fragment is true, the method uses the FragmentInputStream and FragmentFilter classes to encode the file. Otherwise, it uses the transmogrifier directly, as before.

// 6. Encode the input stream.
            if (fragment) {

If the file is a fragment, let the Transmogrifier know. The Transmogrifier still handles encoding, but it is invoked after the input stream has passed through the fragment filter.

                transmogrifier.setFragment(true);

Instantiate a new XMLReader.

                SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
                saxParserFactory.setNamespaceAware(true);
                XMLReader xmlReader = saxParserFactory.newSAXParser().getXMLReader();

Instantiate a FragmentInputStream that adds temporary <root> events, so that the XMLReader will process the stream as well-formed XML.

                InputSource is = new InputSource(in);
                FragmentInputStream fragmentInputStream = 
                    new FragmentInputStream(is.getByteStream());

Instantiate a filter to consume and ignore the <root> events.

                FragmentFilter fragmentFilter = new FragmentFilter(xmlReader);
                fragmentFilter.setContentHandler(transmogrifier.getSAXTransmogrifier());

Parse the file using the fragment filter. The filter intercepts all of the events as they are processed, and removes the <root> events from the output.

                fragmentFilter.parse(new InputSource(fragmentInputStream));
            }
            else {
                transmogrifier.encode(new InputSource(in));
            }
        }

DecodeEXI

The method signature requires just one more Boolean value, to indicate that a file is a fragment rather than a well-formed XML file.

    public void decodeEXI(
            String sourceFile, 
            String destinationFile,
            String alignment,
// Preservation options.
            Boolean preserveComments,
            Boolean preservePIs,
            Boolean preserveDTD,
            Boolean preserveNamespace,
            Boolean preserveLexicalValues,
            int blockSize,
            int maxValueLength,
            int maxValuePartitions,
// Schema options.
            String schemaFileName,
            String exiSchemaFileName,
            Boolean strict,
            String useSchema,
// Datatype Representation Map options.
            Boolean useDTRM,
            String datatypeRepresentationMap,
// Fragment
            Boolean fragment
   )

On the decoding side, the OpenEXI interface is able to transparently process an XML fragment. Any time after you have instantiated the EXIReader, but prior to invoking the EXIReader.parse() method, configure the fragment setting by passing the fragment variable.

            reader.setFragment(fragment);

The EXIReader processes the fragment file and returns it to its original (logical) form.


This example demonstrated how to encode and decode XML fragments rather than well-formed XML documents. This concludes this introductory OpenEXI Tutorial.


Back to OpenEXI Tutorial
Back to OpenEXI Home Page


Updated March 21, 2012.
Tutorial by Dennis Dawson with Takuki Kamiya of Fujitsu Corporation.