Back to OpenEXI Tutorial
Example B - Working with XML Fragments
How to Use Example B
Code HighlightsComplete, commented source code is included in the FragmentInputStreamThe SAX XMLReader will throw an error if an XML file has more than one root element. To work around that limitation, the FragmentInputStream appends temporary <root> tags to the start and end of an XML fragment to create a stream that the XMLReader will accept. package openexi.sample;
import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
public class FragmentInputStream extends InputStream{
Global byte arrays store the strings for the document starting and ending elements.
static byte[] pre;
static byte[] post;
Global variables are used to monitor progress as the file is read.
boolean done = false;
int pos = 0;
int len = 0;
final byte[] buffer;
This class accepts any type of InputStream, most often a FileInputStream.
final InputStream in;
Populate the start pre and post variables with byte arrays converted from strings.
static
{
try
{
pre = "<root>".getBytes("UTF-8");
post = "</root>".getBytes("UTF-8");
}
catch (UnsupportedEncodingException e)
{
throw new RuntimeException(e);
}
}
The constructor creates a buffer, inserts the <root> tag at the beginning, and assigns the input stream to the global variable in.
public FragmentInputStream(InputStream in)
{
this.buffer = new byte[1024];
System.arraycopy(pre, 0, buffer, 0, pre.length);
len = pre.length;
this.in = in;
}
Override both of the read() methods to check if the end of file has been reached. If not, call the fill() method to accept the next byte stream.
@Override
public int read() throws IOException
{
if (pos == len)
{
// Check whether the post tag has been appended to the file.
if (done)
return -1;
// If not, fill the buffer again.
fill();
}
return buffer[pos++] & 0xFF;
}
@Override
public int read(byte[] buf, int offset, int bufLen) throws IOException
{
if (pos == len)
{
// Check whether the post tag has been appended to the file.
if (done)
return -1;
// If not, fill the buffer again.
fill();
}
Figure the current byte in the InputStream (currentByte), the lesser of the buffer length or the end length of the array minus the current position.
int currentByte = Math.min(len - pos, bufLen);
Append the byte array to the global byte array.
System.arraycopy(buffer, pos, buf, offset, currentByte);
Increment the current position by the current byte count, and return the currentByte value.
pos += currentByte;
return currentByte;
}
The fill() method is where most of the work is done, reading 1024 bytes into the buffer on each pass.
protected void fill() throws IOException
{
pos = 0;
len = in.read(buffer, 0, buffer.length);
If the length of the byte stream is 0, the end has been reached, and len = -1. The method appends the post variable ("</root>") to the stream, and sets the done variable to true.
if (len < 0)
{
System.arraycopy(post, 0, buffer, 0, post.length);
len = post.length;
done = true;
}
}
}
FragmentFilterThe FragmentInputStream class adds <root> tags to fool the XMLReader into parsing a fragment file as if it were a well-formed XML document with a single root element. These tags need to be removed as the fragment file is encoded to EXI. The FragmentFilter class captures startElement and endElement events. By incrementing the m_depthCounter for every startElement and decrementing it for every endElement, the methods can identify when the event is at the root level (-1), swallow the event, and continue. Constructors, child elements and all other events are handled normally by the superclass. package openexi.sample;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLFilterImpl;
public class FragmentFilter extends XMLFilterImpl {
int m_depthCounter = -1;
public FragmentFilter(XMLReader xmlReader) {
super(xmlReader);
}
public FragmentFilter() {
super();
}
Override the startElement method. If the m_depthCounter is 0 or higher, pass the event to the superclass. Otherwise, swallow the event and move on. This prevents the <root> element from being included in the output stream.
public void startElement(String uri, String localName, String qname, Attributes atts) {
if (m_depthCounter > -1)
{
try {
super.startElement(uri,localName,qname,atts);
} catch (SAXException e) {
e.printStackTrace();
}
}
Increment the depth counter after processing each starting element. If the next event is a startElement, the depth counter will increase to the next level of nesting.
m_depthCounter++;
}
public void endElement(String uri, String localName, String qname) {
Decrement the depth counter before processing each ending element. If the next event is also an endElement, the depth counter will decrease one level of nesting. If the nesting level is less than 0, swallow the event so that the </root> tag is not included in the output stream.
--m_depthCounter;
if (m_depthCounter >-1)
{
try {
super.endElement(uri, localName, qname);
} catch (SAXException e) {
e.printStackTrace();
}
}
}
}
EncodeEXIThe method signature for encodeEXI is expanded once again with a single Boolean value to indicate when an input file is an XML fragment rather than a well-formed XML document.
public void encodeEXI(
String sourceFile,
String destinationFile,
String alignment,
//Preservation options
Boolean preserveComments,
Boolean preservePIs,
Boolean preserveDTD,
Boolean preserveNamespace,
Boolean preserveLexicalValues,
Boolean preserveWhitespace,
int blockSize,
int maxValueLength,
int maxValuePartitions,
// Schema options
String schemaFileName,
String exiSchemaFileName,
Boolean strict,
String useSchema,
// Datatype Representation Map Options
Boolean useDTRM,
String datatypeRepresentationMap,
// Fragment
Boolean fragment
)
EncodeEXI requires a few extra lines of code during step 6, encoding the input stream, in order to allow for file fragments. If fragment is true, the method uses the FragmentInputStream and FragmentFilter classes to encode the file. Otherwise, it uses the transmogrifier directly, as before.
// 6. Encode the input stream.
if (fragment) {
If the file is a fragment, let the Transmogrifier know. The Transmogrifier still handles encoding, but it is invoked after the input stream has passed through the fragment filter.
transmogrifier.setFragment(true);
Instantiate a new XMLReader.
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
saxParserFactory.setNamespaceAware(true);
XMLReader xmlReader = saxParserFactory.newSAXParser().getXMLReader();
Instantiate a FragmentInputStream that adds temporary <root> events, so that the XMLReader will process the stream as well-formed XML.
InputSource is = new InputSource(in);
FragmentInputStream fragmentInputStream =
new FragmentInputStream(is.getByteStream());
Instantiate a filter to consume and ignore the <root> events.
FragmentFilter fragmentFilter = new FragmentFilter(xmlReader);
fragmentFilter.setContentHandler(transmogrifier.getSAXTransmogrifier());
Parse the file using the fragment filter. The filter intercepts all of the events as they are processed, and removes the <root> events from the output.
fragmentFilter.parse(new InputSource(fragmentInputStream));
}
else {
transmogrifier.encode(new InputSource(in));
}
}
DecodeEXIThe method signature requires just one more Boolean value, to indicate that a file is a fragment rather than a well-formed XML file.
public void decodeEXI(
String sourceFile,
String destinationFile,
String alignment,
// Preservation options.
Boolean preserveComments,
Boolean preservePIs,
Boolean preserveDTD,
Boolean preserveNamespace,
Boolean preserveLexicalValues,
int blockSize,
int maxValueLength,
int maxValuePartitions,
// Schema options.
String schemaFileName,
String exiSchemaFileName,
Boolean strict,
String useSchema,
// Datatype Representation Map options.
Boolean useDTRM,
String datatypeRepresentationMap,
// Fragment
Boolean fragment
)
On the decoding side, the OpenEXI interface is able to transparently process an XML fragment. Any time after you have instantiated the EXIReader, but prior to invoking the EXIReader.parse() method, configure the fragment setting by passing the fragment variable.
reader.setFragment(fragment);
The EXIReader processes the fragment file and returns it to its original (logical) form. This example demonstrated how to encode and decode XML fragments rather than well-formed XML documents. This concludes this introductory OpenEXI Tutorial.
Back to OpenEXI Tutorial
Updated March 21, 2012. Tutorial by Dennis Dawson with Takuki Kamiya of Fujitsu Corporation. |