Xml Manipulation Using Jaxb–Part 1

In older XML parsing approaches based on DOM (Document Object Model), the parser creates a tree of objects in memory which represents how data was organized in the XML document. The application then traverses through the objects to access or updated appropriate nodes and properties. A big effort and Java code is required to manipulate XML documents with this approach.

Major difference of JAXB is that contrary to DOM parsers it uses annotations based POJOs and supports both

  • Marshalling – transforming Java objects into XML
  • Unmarshalling – transforming XML into Java objects.

In this article you will find all the step-by-step process required to integrate JAXB in your application.

Defining XML Schema

The XSD (XML Schema Document) is important to define the structure and semantics of XML documents, which means the allowed elements, shared vocabularies, rules and restrictions required for the XML document. XML Schema supports elements of complex type and single type.

Complex elements are the ones which contain other elements and/or attributes and usually correspond to objects in Java.

Single elements can contain only text and can have data types such as string, date, boolean, Numeric etc. These elements corresponds to object properties.

Restrictions in XSD confined the allowed values for elements and attributes. Restrictions are crucial for validating XML documents. There are several type of restrictions like length, min-max, set-of-values, restricted-values and regular expressions. For further information about the allowed regular expression syntax for XSD, this tutorial is very handy.

An easy to follow XML Schema tutorial can be found here.

XML Bindings

Once the XSD is finalized the next step is to compile XSD into binding classes. JAXB provides a compiler for generating XML bindings called xjc.

xjc -xmlschema "<xmlschema.xsd>" -p "<package.detialed.name>"

The compiler generates the annotated XML binding Java classes. Another option to generate XML binding is an Eclipse Plugin available for generating Java classes from XML Schema.

Marshalling

Once we have XML binding Java classes the next step is to marshal the Java objects (i.e. transforming Java object into relevant XML elements).

JAXBContext jaxbContext = JAXBContext.newInstance(ProductExtract.class);

StringWriter writer = new StringWriter();

Marshaller marshaller= jaxbContext.createMarshaller();
// set property for formatted output 
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);

marshaller.marshal(extract, writer);

Some time the text for a XML element might contains special characters or Unicode characters then we need to use CDATA. XML parser parses also the text in an XML document and the special characters are replaced with escape characters, but the text within CDATA tag is ignored.

To include CDATA, the first step to configure is JAXBContext by adding implementation of CharacterEscapeHandler to override default mechanism of adding escapes characters.

marshaller.setProperty(CharacterEscapeHandler.class.getName() , new CharacterEscapeHandler() {
  public void escape( char[] ac, int i, int j, boolean flag, Writer writer ) throws   IOException  
  {  				    
     // do not escape  
     writer.write( ac, i, j );  
  }			
 });  

The second step is to implement the XmlAdapter and apply it to the property using @XmlJavaTypeAdapter annotation which should contain CDATA.

public class CommentAdapter extends XmlAdapter<String, String>  
{  
	public static boolean addCDATA;

	@Override  
	public String marshal(String str) throws Exception  
	{  
		// not empty and add CDATA to strings
		if(str!= null && !str.equals("") && addCDATA){
			return "<![CDATA[" + str + "]]>";
		}
		else{
			return str;
		}
	}  

	@Override  
	public String unmarshal( String str ) throws Exception  
	{  
		return str;  
	}  

}

Here is how the adapter is applied to the property using annotation

@XmlElement(name = "CustomerComment")
@XmlJavaTypeAdapter(value=CommentAdapter.class)   
protected String customerComment;

Unmarshalling

When an application want to read already generated it has to Unmarshal the XML document, which means reading the document and initializing the objects of XML binding Java classes. Unmarshalling using JAXB is very much similar as of marshalling.

JAXBContext jaxbContext = JAXBContext.newInstance(ProductExtract.class);

// create unmarshaller
Unmarshaller unmarshaller= jaxbContext.createUnmarshaller();

SchemaFactory sf = SchemaFactory.newInstance(javax.xml.XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(new File(SchemaFile));

// set properties before unmarshalling
unmarshaller.setSchema(schema);
unmarshaller.setEventHandler(new ProductValidationEventHandler());

Reader reader = new InputStreamReader(new FileInputStream(getFileName()));

unmarshalledDataExtract = (ProductExtract) unmarshaller.unmarshal(reader);	

Validation

Validation is very important for verifying the data and hence quality of the XML document. Validation is applied by adding the event handler of type ValidationEventHandler, while unmarshalling.

private boolean validateXML(StringWriter writer)
{
	try{

		// create unmarshaller
		Unmarshaller unmarshaller= jaxbContext.createUnmarshaller();

		SchemaFactory sf = SchemaFactory.newInstance(javax.xml.XMLConstants.W3C_XML_SCHEMA_NS_URI);
		Schema schema = sf.newSchema(new File(SchemaFile));
		// set properties before unmarshalling
		unmarshaller.setSchema(schema);
		unmarshaller.setEventHandler(new ProductValidationEventHandler());

		//final ProductExtract unmarshalledDataExtract = (ProductExtract) unmarshaller.unmarshal(new StringReader(writer.toString()));
		unmarshaller.unmarshal(new StringReader(writer.toString()));
		return true;
	}
	catch(Exception e){
		e.printStackTrace();
		return false;
	}	
}

Here is the Validation event handler class

public class ProductValidationEventHandler implements ValidationEventHandler{

private static final Log logger = LogFactory.getLog(ProductValidationEventHandler.class);

public boolean handleEvent(ValidationEvent ve) {

      if (ve.getSeverity()==ValidationEvent.FATAL_ERROR ||  
          ve.getSeverity()==ValidationEvent.ERROR)
      {	        	
          ValidationEventLocator  locator = ve.getLocator();
          //Print message from valdation event
          logger.info("Error: " + ve.getMessage());          
          logger.info("Invalid xml document: " + locator.getURL());          
          //Output line and column number
          logger.info("Error at column " + locator.getColumnNumber() + 
		  ", line " + locator.getLineNumber());          
       }

       return true;
     }    
 }

When there exists an error while validating the generated XML document, the ValidationEventHandler will spit out the validation errors in logs.

INFO: Error: cvc-pattern-valid: Value 'CAD1' is not facet-valid with respect to pattern '[a-zA-Z]{3}' for type 'currencyType'.
INFO: Error at column 125, line 3

Conclusion

In this article we have presented a step-by-step approach from defining XML Schema to generating Java XML binding classes for marshalling and unmarshalling the XML documents together with validating XML data. JAXB provides more efficient and convenient solution than DOM and SAX.

There exist several 3rd parties implementations for JAXB but in this article we stick to default implementation provided. A nice comparison about JAXB, SAX and DOM performance is explained in this article.

In the next article we will explain how to use XPath with JAXB which does not support it directly.

Leave a comment