Example 4 - Working with Schemas
This example shows how to use a schema to provide explicit structural information that can further improve EXI file compaction.
- What Example 4 Demonstrates
- How to Use Example 4
- Code Highlights
What Example 4 Demonstrates
The Nagasena Transmogrifier is able to consume any XML file, infer its structure, and optimize the file for storage and transmission. In order to handle the vast array of possible values, the EXI specification is intentionally conservative when setting rules for determining what information can be converted to more efficient datatypes and which should be left intact to prevent loss of significant data.
Providing the Transmogrifier with an XML schema along with XML data assists the decision-making process. If an element is supposed to contain integer values, for example, then the Transmogrifier can trim whitespace within the element and convert the string input to the integer datatype.
data:image/s3,"s3://crabby-images/214b6/214b63a6e1a90022ad82f6dd19497daacbf142b5" alt=""
By default, the Transmogrifier uses the schema to help create the correct structure for the generated EXI file. If data in the input file are not in sync with the schema, the Transmogrifier infers the new structure on the fly so that no data are lost. If you are confident that the XML input file is fully compliant with the schema, you can encode and decode the EXI file using the STRICT option. If the file is not compliant, Nagasena throws an error during compaction. When using the STRICT option, no preservation options are available with the exception of "Preserve Lexical Values." Using a schema with the STRICT option ensures that your file will be compacted (and optionally compressed) as effiiciently as possible using the Nagasena utilities.
Before using a schema, Nagasena preprocesses an XML Schema Definition (XSD) to an EXI Grammar (EXIG). You can save a converted EXIG to a file rather than process the XSD every time you convert an XML file to EXI format.
Example 4 adds text fields for a schema (XSD) source file and a target destination file for the EXI Grammar (EXIG), a browse button for the source file, and a button to convert an XSD to an EXIG. Radio buttons let you choose whether to use an XSD, EXIG, or No Schema when generating an EXI file. A checkbox lets you choose whether to use Strict interpretation of the schema to convert and restore the XML source.
How to Use Example 4
To install and run Example 4:
- Download and expand OpenEXI_Example4.zip. This zip archive contains the compiled example application classes and Java source code. Expanding the file creates a directory of name "OpenEXI_Example4".
- From command line, move into the "OpenEXI_Example4" directory.
- Enter the command:
java -jar OpenEXI_Example4.jar
To encode an XSD file to EXIG:
- Click the Browse... button to the right of the XSD File Name and select an XSD file to process.
- A suggested destination file name appears in the EXIG File Name field. You can accept this value or modify it as needed.
- Click Serialize EXIG.
To encode an XML file to EXI:
- Click Browse... to the right of the Source File field to select an XML file to encode. The selected file name appears in the Source File field. A suggested name is displayed in the Destination File field, but you can edit the location or file name according to your needs.
- Use the radio buttons to choose an alignment type (byte-aligned documents are the easiest to examine with a text editor, if you want to peek inside).
- Select checkboxes to try different preservation settings.
- Enter a new integer value for Element/Attribute Block Size.
- Enter new integer values for String Table Max Value Length and String Table Max Value Partitions.
- Optionally enter an XML schema (XSD) or EXI Grammar (EXIG) file name in the appropriate field. Set the radio button to None, XSD, or EXIG.
- Optionally select the Strict checkbox, if you are confident that the XML source file is 100% compliant with the schema.
- Click Encode.
To decode an EXI file to XML:
- Click Browse... to the right of the Source File field select an EXI file to decode. The selected file name appears in the Source File field. A suggested name is displayed in the Destination File field, but you can edit the location or file name according to your needs.
- Set the Alignment, preservation checkboxes, integer values, and schema settings to the same values used to encode the file.
- Click Decode.
Code Highlights
Complete, commented source code is included in the src
directory in OpenEXI_Example4.zip. This section highlights the important updates in each iteration as the examples build on one another.
SerializeEXISchema
Serializing a schema is a black box operation. The EXISchemaFactory accepts an input stream and converts it to an EXISchema. The results can be captured and written to a file.
EXISchemaFactory factory; EXISchema newSchema; FileInputStream fis; InputSource is; FileOutputStream fos; . . . fis = new FileInputStream(xsdFileName); is = new InputSource(fis); // Process a new schema. newSchema = factory.compile(is); // Write the results to a file. fos = new FileOutputStream(exigFileName); new EXISchemaWriter().serialize(newSchema, fos);
Using the preprocessed EXIG file saves the step of processing the XSD every time a corresponding XML file is converted to EXI.
EncodeEXI
The method signature for encodeEXI is expanded once more to accept four new options for using a schema to encode EXI.
public void encodeEXI( String sourceFile, String destinationFile, String alignment, //Preservation options Boolean preserveComments, Boolean preservePIs, Boolean preserveDTD, Boolean preserveNamespace, Boolean preserveLexicalValues, Boolean preserveWhitespace, int blockSize, int maxValueLength, int maxValuePartitions, // Schema options String schemaFileName, String exiSchemaFileName, Boolean strict, String useSchema )
The method creates the short integer variable options and sets it to GrammarOptions.DEFAULT_OPTIONS (equal to 2).
short options = GrammarOptions.DEFAULT_OPTIONS;
The Boolean value strict, if true, indicates that the generated EXI file will strictly adhere to the XML Schema Definition. The only preservation setting allowed with the strict option is "Preserve Lexical Values." All others are ignored. If strict is true, the method skips the preservation arguments and sets the GrammarOptions value to STRICT_OPTIONS (1). If strict is false, the preservation options are processed one at a time to calculate a sum total of options.
if (strict) { options = GrammarOptions.STRICT_OPTIONS; } else { if (preserveComments) options = GrammarOptions.addCM(options); if (preservePIs) options = GrammarOptions.addPI(options); if (preserveDTD) options = GrammarOptions.addDTD(options); if (preserveNamespace) options = GrammarOptions.addNS(options); }
The method creates a new schema with a null value. If the useSchema string is set to None it will remain null.
EXISchema schema = null;
If useSchema is set to XSD, the XSD file must be read and converted to an EXISchema.
if(useSchema.equals("XSD")) { try { InputSource is = new InputSource(schemaFileName); EXISchemaFactory factory = new EXISchemaFactory(); schema = factory.compile(is, false); } finally { } }
If useSchema is set to EXIG, the method reads the EXIG file to make an EXISchema.
else if (useSchema.equals("EXIG")) { try { fis = new FileInputStream(exiSchemaFileName); schema = new EXISchemaReader().parse(fis); } finally { if (fis != null) fis.close(); } }
The schema and options can then be passed to the Grammar Cache, and the Grammar Cache passed to the Transmogrifier to set the schema.
grammarCache = new GrammarCache(schema, options); transmogrifier.setGrammarCache(grammarCache);
The schema options are now set, and the method can continue as before to process the EXI file.
DecodeEXI
The arguments to decode an EXI file must match the arguments used to encode the file. The method signature reflects these additional settings for working with schemas.
public void decodeEXI( String sourceFile, String destinationFile, String alignment, // Preservation options. Boolean preserveComments, Boolean preservePIs, Boolean preserveDTD, Boolean preserveNamespace, Boolean preserveLexicalValues, int blockSize, int maxValueLength, int maxValuePartitions, // Schema options. String schemaFileName, String exiSchemaFileName, Boolean strict, String useSchema )
Grammar Options are determined exactly as they are determined for encodeEXI.
short options = GrammarOptions.DEFAULT_OPTIONS; // If using strict interpretation of the schema, set STRICT_OPTIONS and continue. if (strict) { options = GrammarOptions.STRICT_OPTIONS; } // Otherwise, check for preservation settings. else { if (preserveComments) options = GrammarOptions.addCM(options); if (preservePIs) options = GrammarOptions.addPI(options); if (preserveDTD) options = GrammarOptions.addDTD(options); if (preserveNamespace) options = GrammarOptions.addNS(options); }
Create an EXISchema (the variable schema) with a null value. If useSchema is set to None, schema will remain null.
EXISchema schema = null;
If useSchema is set to XSD, the XML Schema Definition must be converted to an EXI Schema Definition.
if(useSchema.equals("XSD")) { try { InputSource is = new InputSource(schemaFileName); EXISchemaFactory factory = new EXISchemaFactory(); schema = factory.compile(is, false); } finally { } }
If useSchema is set to EXIG, the file can be read directly to make an EXISchema object.
else if (useSchema.equals("EXIG")) { try { fis = new FileInputStream(exiSchemaFileName); schema = new EXISchemaReader().parse(fis); } finally { if (fis != null) fis.close(); } }
The method sets the Grammar Cache with the schema and options variables, then uses the Grammar Cache to set the EXISchema in the EXIReader instance.
grammarCache = new GrammarCache(schema, options); reader.setGrammarCache(grammarCache);
The schema settings in place, the method can continue as before to convert the EXI file back to XML format.
Extra Credit — EXISchemaFactoryErrorHandler
By default, Nagasena does not report schema compilation errors. You can add a handler to help troubleshoot problems when compiling your custom XSDs.
To capture error messages, implement the interface EXISchemaFactoryErrorHandler and override its three methods that report warnings, errors, and fatal errors. The following is a sample implementation of the EXISchemaFactoryHandler.
package openexi.sample; import org.openexi.scomp.EXISchemaFactoryErrorHandler; import org.openexi.scomp.EXISchemaFactoryException; public class EXISchemaFactoryExceptionHandlerSample implements EXISchemaFactoryErrorHandler { public EXISchemaFactoryExceptionHandlerSample() { super(); } public void warning(EXISchemaFactoryException eXISchemaFactoryException) throws EXISchemaFactoryException { eXISchemaFactoryException.printStackTrace(); } public void error(EXISchemaFactoryException eXISchemaFactoryException) throws EXISchemaFactoryException { eXISchemaFactoryException.printStackTrace(); } public void fatalError(EXISchemaFactoryException eXISchemaFactoryException) throws EXISchemaFactoryException { eXISchemaFactoryException.printStackTrace(); } }
In EncodeEXI.java, DecodeEXI.java, and SerializeEXISchema.java, add two lines of code after you instantiate the EXISchemaFactory but prior to the compile command.
EXISchemaFactory factory = new EXISchemaFactory(); EXISchemaFactoryExceptionHandlerSample esfe = new EXISchemaFactoryExceptionHandlerSample(); factory.setCompilerErrorHandler(esfe); schema = factory.compile(is);
While most users are working with well-vetted legacy XML files, those who are still adjusting the parameters of their XML schema will benefit from the additional feedback provided by the error handler.
Updated March 30, 2015.
Tutorial by Dennis Dawson with Takuki Kamiya of Fujitsu Laboratories of America.