DOM and SAX Parsing using JAVA

I am not a very proficient java developer but I can find my way after spending some time. This might help save some time who is just like me and not a real pro at JAVA 🙂 . I am posting a sample XML and the parser to parse it in DOM and SAX using Java.

Here is the xml:

<?xml version=”1.0″?>
<!DOCTYPE Syllabus SYSTEM “Syllabus.dtd”>
<Syllabus xmlns:xsi=”http://www.w3.org/2001/XMLSchema” xsi:schemaLocation=”http://www.w3schools.com note.xsd”>
  <Lecture SerialNo=”1″>
    <Week>1</Week>
    <Date>Jun 3</Date>
    <Material>Sample Material</Material>
    <Reading>Sample Reading </Reading>
    <AssignedHW>Sample Text</AssignedHW>
    <DueHW></DueHW>
  </Lecture>
  <Lecture SerialNo=”2″>
    <Week>2</Week>
    <Date>Jun 10</Date>
    <Material>Some Text</Material>
    <Reading>Some Boring Text </Reading>
    <AssignedHW>HW2</AssignedHW>
    <DueHW>HW1</DueHW>
  </Lecture>
  <Lecture SerialNo=”3″>
    <Week>3</Week>
    <Date>Jun 17</Date>
    <Material>Some Lame material</Material>
    <Reading>Lame reading</Reading>
    <AssignedHW>HW4</AssignedHW>
    <DueHW>HW3</DueHW>
  </Lecture>
</Syllabus>

Here is the code:

import java.io.File;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class Parser
{   
   
    public Parser()
    {
    }   
   
    public void doDOMParsing(String strFileName)
    {
        try
        {           
            java.io.File file = new File(strFileName);
            DocumentBuilder builder =   DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document doc = builder.parse(file);

            NodeList nodes = doc.getElementsByTagName(“Lecture”);
            for (int i = 0; i < nodes.getLength(); i++)
            {
                System.out.println(“PARSED LECTURE DATA”);
                System.out.println(“————-“);
               
                Element element = (Element) nodes.item(i);
               
                NodeList weekData = element.getElementsByTagName(“Week”);
                Element line = (Element) weekData.item(0);               
                String strWeekData = getCharacterDataFromElement(line);
                System.out.println(“Week : ” + strWeekData);

                NodeList dateData = element.getElementsByTagName(“Date”);
                line = (Element)dateData.item(0);
                String strDateDate = getCharacterDataFromElement(line);
                System.out.println(“Date : ” + strDateDate);
               
                NodeList materialData = element.getElementsByTagName(“Material”);
                line = (Element)materialData.item(0);
                String strMaterial = getCharacterDataFromElement(line);
                System.out.println(“Material : ” + strMaterial);
               
                NodeList readingData = element.getElementsByTagName(“Reading”);
                line = (Element)readingData.item(0);
                String strReading = getCharacterDataFromElement(line);
                System.out.println(“Reading : ” + strReading);
               
                NodeList assignedHWData = element.getElementsByTagName(“AssignedHW”);
                line = (Element)assignedHWData.item(0);
                String strAssignedHW = getCharacterDataFromElement(line);
                System.out.println(“Assigned HomeWork : ” + strAssignedHW);
               
                NodeList dueHWData = element.getElementsByTagName(“DueHW”);
                line = (Element)dueHWData.item(0);
                String strDueHW = getCharacterDataFromElement(line);
                System.out.println(“Due HomeWork : ” + strDueHW);        
               
                System.out.println(“——————————–“);
            }
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }     
    }   
   
    /* This is the helper function to get Element Data in DOM Parsing */
    private String getCharacterDataFromElement(Element e)
    {
        try
        {
            Node child = e.getFirstChild();
            if(child instanceof CharacterData)
            {
                CharacterData cd = (CharacterData) child;
                return cd.getData();
            }
 }
        catch(Exception ex)
        {
            ex.printStackTrace();
        }
       
 return “”;
    }
   
    /* This is the innerclass that inherits from Default Handler for SAX Parsing */
    class SAXHandler extends DefaultHandler
    {
        boolean bWeek = false;
        boolean bDate   = false;
        boolean bMaterial = false;
        boolean bReading = false;
        boolean bAssignedHW = false;
        boolean bDueHW = false;
       
        public void startElement(String nsURI, String strippedName,String tagName, Attributes attributes)      throws SAXException
        {
            if (tagName.equalsIgnoreCase(“Week”))
                bWeek = true;
            if (tagName.equalsIgnoreCase(“Date”))
                bDate = true;
            if (tagName.equalsIgnoreCase(“Material”))
                bMaterial = true;
            if (tagName.equalsIgnoreCase(“Reading”))
                bReading = true;
            if (tagName.equalsIgnoreCase(“AssignedHW”))
                bAssignedHW = true;
            if (tagName.equalsIgnoreCase(“DueHW”))
                bDueHW = true;
        }
       
   
        public void characters(char[] ch, int start, int length)
        {
            if (bWeek)
            {
                System.out.println(“Week : ” + new String(ch, start, length));
                bWeek = false;
            }
            else if (bDate)
            {
                System.out.println(“Date :  ” + new String(ch, start,length));
                bDate = false;
            }
            else if (bMaterial)
            {
                System.out.println(“Material :  ” + new String(ch, start,length));
                bMaterial = false;
            } 
            else if (bReading)
            {
                System.out.println(“Reading :  ” + new String(ch, start,length));
                bReading = false;
            } 
            else if (bAssignedHW)
            {
                System.out.println(“Assigned Homework :  ” + new String(ch, start,length));
                bAssignedHW = false;
            } 
            else if (bDueHW)
            {
                System.out.println(“Due Homework  :  ” + new String(ch, start,length));
                bDueHW = false;
            } 
        }
    }
   
    /* This is the entry point where SAX Parsing starts */
    public void doSAXParsing(String strFileName )
    {
        try
        {
            SAXParserFactory saxFactory = SAXParserFactory.newInstance();
            SAXParser saxParser = saxFactory.newSAXParser();   
            SAXHandler handler = new SAXHandler();
            saxParser.parse(new File(strFileName),handler);
       
        }
        catch (Exception ex)
        {
            ex.printStackTrace();
        }
    }
   
    /* This is the driver function from where the DOM AND SAX parsing has to be called
       The file name has to be passed in like “Syllabus.xml”
     */
    public static void main(String []args)
    {
        try
        {  
            String strFileName = “..\\resources\\Syllabus.xml”;
            String strParsingMode=””;
           
            if (args.length != 1)
            {
                System.out.println(“The valid values of the parameter passed in are DOM and SAX. Press any key to continue:”);
                int a = System.in.read();           
                System.exit(0);
            }       
           
            Parser driverParser = new Parser();
           
            strParsingMode =  args[0];                   
            if (strParsingMode.equalsIgnoreCase(“dom”))
            {
                System.out.println(“DOM PARSING”);
                driverParser.doDOMParsing(strFileName);
            }
            else if (strParsingMode.equalsIgnoreCase(“sax”))
            {
                System.out.println(“SAX PARSING”);
                driverParser.doSAXParsing(strFileName);
            }
            else
            {
                System.out.println(“The parsing mode can be DOM or SAX”);
                System.exit(0);
            }           
       
        }
        catch(Exception ex)
        {
            System.out.println(“Error Occured while processing”);
            ex.printStackTrace();
        }
       
    }
}

To me how SAX parsing is implemented using Inner classes was really interesting . I hope it helps someone looking for some help and finds it 🙂

Advertisements

2 Comments

Filed under 1

2 responses to “DOM and SAX Parsing using JAVA

  1. barriers

    DOM and SAX are relatively outdated technology, you might also want to check out Pull and VTD-XML

    http://vtd-xml.sf.net

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s