I am not a very proficient java developer but I can find my way after spending some time. This might help save some time who is just like me and not a real pro at JAVA
. I am posting a sample XML and the parser to parse it in DOM and SAX using Java.
Here is the xml:
<?xml version=”1.0″?>
<!DOCTYPE Syllabus SYSTEM “Syllabus.dtd”>
<Syllabus xmlns:xsi=”http://www.w3.org/2001/XMLSchema” xsi:schemaLocation=”http://www.w3schools.com note.xsd”>
<Lecture SerialNo=”1″>
<Week>1</Week>
<Date>Jun 3</Date>
<Material>Sample Material</Material>
<Reading>Sample Reading </Reading>
<AssignedHW>Sample Text</AssignedHW>
<DueHW></DueHW>
</Lecture>
<Lecture SerialNo=”2″>
<Week>2</Week>
<Date>Jun 10</Date>
<Material>Some Text</Material>
<Reading>Some Boring Text </Reading>
<AssignedHW>HW2</AssignedHW>
<DueHW>HW1</DueHW>
</Lecture>
<Lecture SerialNo=”3″>
<Week>3</Week>
<Date>Jun 17</Date>
<Material>Some Lame material</Material>
<Reading>Lame reading</Reading>
<AssignedHW>HW4</AssignedHW>
<DueHW>HW3</DueHW>
</Lecture>
</Syllabus>
Here is the code:
import java.io.File;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class Parser
{
public Parser()
{
}
public void doDOMParsing(String strFileName)
{
try
{
java.io.File file = new File(strFileName);
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(file);
NodeList nodes = doc.getElementsByTagName(“Lecture”);
for (int i = 0; i < nodes.getLength(); i++)
{
System.out.println(“PARSED LECTURE DATA”);
System.out.println(“————-”);
Element element = (Element) nodes.item(i);
NodeList weekData = element.getElementsByTagName(“Week”);
Element line = (Element) weekData.item(0);
String strWeekData = getCharacterDataFromElement(line);
System.out.println(“Week : ” + strWeekData);
NodeList dateData = element.getElementsByTagName(“Date”);
line = (Element)dateData.item(0);
String strDateDate = getCharacterDataFromElement(line);
System.out.println(“Date : ” + strDateDate);
NodeList materialData = element.getElementsByTagName(“Material”);
line = (Element)materialData.item(0);
String strMaterial = getCharacterDataFromElement(line);
System.out.println(“Material : ” + strMaterial);
NodeList readingData = element.getElementsByTagName(“Reading”);
line = (Element)readingData.item(0);
String strReading = getCharacterDataFromElement(line);
System.out.println(“Reading : ” + strReading);
NodeList assignedHWData = element.getElementsByTagName(“AssignedHW”);
line = (Element)assignedHWData.item(0);
String strAssignedHW = getCharacterDataFromElement(line);
System.out.println(“Assigned HomeWork : ” + strAssignedHW);
NodeList dueHWData = element.getElementsByTagName(“DueHW”);
line = (Element)dueHWData.item(0);
String strDueHW = getCharacterDataFromElement(line);
System.out.println(“Due HomeWork : ” + strDueHW);
System.out.println(“——————————–”);
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
/* This is the helper function to get Element Data in DOM Parsing */
private String getCharacterDataFromElement(Element e)
{
try
{
Node child = e.getFirstChild();
if(child instanceof CharacterData)
{
CharacterData cd = (CharacterData) child;
return cd.getData();
}
}
catch(Exception ex)
{
ex.printStackTrace();
}
return “”;
}
/* This is the innerclass that inherits from Default Handler for SAX Parsing */
class SAXHandler extends DefaultHandler
{
boolean bWeek = false;
boolean bDate = false;
boolean bMaterial = false;
boolean bReading = false;
boolean bAssignedHW = false;
boolean bDueHW = false;
public void startElement(String nsURI, String strippedName,String tagName, Attributes attributes) throws SAXException
{
if (tagName.equalsIgnoreCase(“Week”))
bWeek = true;
if (tagName.equalsIgnoreCase(“Date”))
bDate = true;
if (tagName.equalsIgnoreCase(“Material”))
bMaterial = true;
if (tagName.equalsIgnoreCase(“Reading”))
bReading = true;
if (tagName.equalsIgnoreCase(“AssignedHW”))
bAssignedHW = true;
if (tagName.equalsIgnoreCase(“DueHW”))
bDueHW = true;
}
public void characters(char[] ch, int start, int length)
{
if (bWeek)
{
System.out.println(“Week : ” + new String(ch, start, length));
bWeek = false;
}
else if (bDate)
{
System.out.println(“Date : ” + new String(ch, start,length));
bDate = false;
}
else if (bMaterial)
{
System.out.println(“Material : ” + new String(ch, start,length));
bMaterial = false;
}
else if (bReading)
{
System.out.println(“Reading : ” + new String(ch, start,length));
bReading = false;
}
else if (bAssignedHW)
{
System.out.println(“Assigned Homework : ” + new String(ch, start,length));
bAssignedHW = false;
}
else if (bDueHW)
{
System.out.println(“Due Homework : ” + new String(ch, start,length));
bDueHW = false;
}
}
}
/* This is the entry point where SAX Parsing starts */
public void doSAXParsing(String strFileName )
{
try
{
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxFactory.newSAXParser();
SAXHandler handler = new SAXHandler();
saxParser.parse(new File(strFileName),handler);
}
catch (Exception ex)
{
ex.printStackTrace();
}
}
/* This is the driver function from where the DOM AND SAX parsing has to be called
The file name has to be passed in like “Syllabus.xml”
*/
public static void main(String []args)
{
try
{
String strFileName = “..\\resources\\Syllabus.xml”;
String strParsingMode=”";
if (args.length != 1)
{
System.out.println(“The valid values of the parameter passed in are DOM and SAX. Press any key to continue:”);
int a = System.in.read();
System.exit(0);
}
Parser driverParser = new Parser();
strParsingMode = args[0];
if (strParsingMode.equalsIgnoreCase(“dom”))
{
System.out.println(“DOM PARSING”);
driverParser.doDOMParsing(strFileName);
}
else if (strParsingMode.equalsIgnoreCase(“sax”))
{
System.out.println(“SAX PARSING”);
driverParser.doSAXParsing(strFileName);
}
else
{
System.out.println(“The parsing mode can be DOM or SAX”);
System.exit(0);
}
}
catch(Exception ex)
{
System.out.println(“Error Occured while processing”);
ex.printStackTrace();
}
}
}
To me how SAX parsing is implemented using Inner classes was really interesting . I hope it helps someone looking for some help and finds it
DOM and SAX are relatively outdated technology, you might also want to check out Pull and VTD-XML
http://vtd-xml.sf.net
Thank you. This looks pretty interesting.