使用dom方式操作xml檔案,即是和dom樹打交道的過程:在建構xml檔案時,首先建構一棵dom樹,然後将該樹狀結構寫成xml檔案;在解析xml檔案時,首先将源xml檔案解析成一棵dom樹,然後周遊這棵dom樹、或從dom樹中查找需要的資訊。
關于dom樹中節點類型、不同節點具有的接口、特性、限制等資訊可以參考《dom樹節點解析》,本文隻關注如何建構xml檔案與解析xml檔案。在建構和解析xml檔案中,都以w3school中的books.xml檔案的内容為例:
<?xml version="1.0" encoding="utf-8"?>
<bookstore>
<book category="children">
<title lang="en">harry potter</title>
<author>j k. rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="cooking">
<title lang="en">everyday italian</title>
<author>giada de laurentiis</author>
<price>30.00</price>
<bookcategory="web"cover="paperback" >
<title lang="en">learning xml</title>
<author>erik t. ray</author>
<year>2003</year>
<price>39.95</price>
<book category="web">
<title lang="en">xquery kick start</title>
<author>james mcgovern</author>
<author>per bothner</author>
<author>kurt cagle</author>
<author>james linn</author>
<author>vaidyanathan nagarajan</author>
<price>49.99</price>
</bookstore>
我們都知道java是一門面向對象的語言,因而我們需要盡量以面向對象的思想我編寫代碼,面向對象程式設計其中一個比較重要的特點就是基于對象程式設計,因而我們在編寫這個測試代碼時,也盡量的基于對象操作,而不是像過程式的語言,有一點資訊做一點操作。
在這裡,對xml檔案中定義的book元素,我們使用book對象與其對應:
public class book {
private string category;
private string cover;
private titleinfo title;
private list<string> authors;
private int year;
private double price;
...
public static class titleinfo {
private string title;
private string lang;
...
}
}
根據xml檔案定義建構book執行個體:
public class w3cbooksbuilder {
public static list<book> buildbooks() {
list<book> books = new arraylist<book>();
books.add(buildharraybook());
books.add(builceverydayitalian());
books.add(buildlearningxml());
books.add(buildxquerykickstart());
return books;
public static book buildharraybook() {
book book = new book();
book.setcategory("children");
book.settitle(new titleinfo("harry potter", "en"));
book.setauthors(arrays.aslist("j k. rowling"));
book.setyear(2005);
book.setprice(29.99);
return book;
public static book builceverydayitalian() {
...
public static book buildlearningxml() {
public static book buildxquerykickstart() {
dom使用documentbuilder類來解析xml檔案,它提供parse方法,将xml檔案解析成一棵dom樹,并傳回document執行個體:
public document parse(inputstream is);
public document parse(inputstream is, string systemid);
public document parse(string uri);
public document parse(file f);
public abstract document parse(inputsource is);
documentbuilder類還提供了判斷目前解析器是否存在命名空間解析、驗證等配置,以及提供了設定entityresolver、errorhandler的接口。這裡使用entityresolver和errorhandler隻是重用sax的api,并不表示dom解析的内部實作一定要基于sax,然而貌似jdk自帶的dom解析内部使用的引擎就是sax。t_t
public abstract boolean isnamespaceaware();
public abstract boolean isvalidating();
public abstract void setentityresolver(entityresolver er);
public abstract void seterrorhandler(errorhandler eh);
documentbuilder提供了 建構document執行個體的工廠方法,在以程式設計方式建構dom樹時,首先需要建構document執行個體,繼而使用document執行個體建構其餘節點類型,而建構document執行個體需要通過documentbuilder類來實作:
public abstract document newdocument();
最後,documentbuilder還提供了一些額外的方法,比如重置documentbuilder執行個體的狀态,以重用該documentbuilder;擷取domimplementation執行個體;擷取schema執行個體;判斷xinclude處理模式。
public void reset();
public abstract domimplementation getdomimplementation();
public schema getschema();
public boolean isxincludeaware();
documentbuilder是一個抽象類,要擷取documentbuilder執行個體,需要使用documentbuilderfactory。documentbuilderfactory提供了多種查找documentbuilder實作類的方法;documentbuilderfactory本身也是抽象類,它提供了兩個靜态方法來建立documentbuilderfactory執行個體:
public static documentbuilderfactory newinstance();
public static documentbuilderfactory newinstance(string factoryclassname, classloader classloader);
不帶參數的newinstance()方法使用以下步驟查找documentbuilderfactory的實作類:
1. 檢視系統屬性中是否存在javax.xml.parsers.documentbuilderfactory為key的定義,如果存在,則使用該key定義的值作為documentbuilderfactory的實作類。
2. 查找${java.home}/lib/jaxp.properties屬性檔案中是否存在javax.xml.parsers.documentbuilderfactory為key的定義,若存在,則使用該屬性檔案中以該key定義的值作為documentbuilderfactory的實作類。
3. 查找目前classpath(包括jar包中)下是否存在meta-inf/services//javax.xml.parsers.documentbuilderfactory檔案的定義(serviceprovider),若存在,則讀取該檔案中的第一行的值作為documentbuilderfactory的實作類。
4. 若以上都沒有找到,則使用預設的documentbuilderfactory的實作類:
com.sun.org.apache.xerces.internal.jaxp.documentbuilderfactoryimpl
在找到相應的documentbuilderfactory實作類後,執行個體化該實作類,并傳回documentbuilderfatory執行個體。這裡的查找機制和xmlreaderfactory查找xmlreader實作類以及commons-logging查找logfactory的機制很像。
對帶參數的newinstance()方法,直接使用參數中提供的documentbuilderfactory實作類以及classloader來建立documentbuilderfactory執行個體。
最後,在系統屬性中将jaxp.debug設定為true可以打開調試資訊。
在建立documentbuilderfactory執行個體後,如其名所示,它可以用于擷取documentbuilder執行個體,另外,documentbuilderfactory還提供了配置解析器的方法:
public abstract documentbuilder newdocumentbuilder();
public void setnamespaceaware(boolean awareness);
public boolean isnamespaceaware();
public void setvalidating(boolean validating);
public boolean isvalidating();
public void setignoringelementcontentwhitespace(boolean whitespace);
public boolean isignoringelementcontentwhitespace();
public void setexpandentityreferences(boolean expandentityref);
public boolean isexpandentityreferences();
public void setignoringcomments(boolean ignorecomments);
public boolean isignoringcomments();
public void setcoalescing(boolean coalescing);
public boolean iscoalescing();
public void setxincludeaware(final boolean state);
public abstract void setattribute(string name, object value);
public abstract object getattribute(string name);
public abstract void setfeature(string name, boolean value);
public abstract boolean getfeature(string name);
public void setschema(schema schema);
在建立出documentbuilderfactory,使用該factory建立documentbuilder執行個體後,就可以使用該documentbuilder解析xml檔案成一個document執行個體,而通過該document執行個體就可以周遊、查找dom樹,進而獲得想要的資訊。在下面的例子中,周遊dom樹,建立多個book執行個體:
public class w3cbooksdomreader {
private static documentbuilderfactory factory = documentbuilderfactory.newinstance();
private string booksxmlfile;
public w3cbooksdomreader(string booksxmlfile) {
this.booksxmlfile = booksxmlfile;
public list<book> parse() {
document doc = parsexmlfile();
element root = doc.getdocumentelement();
nodelist nodes = root.getelementsbytagname("book");
for(int i = 0; i < nodes.getlength(); i++) {
books.add(parsebookelement((element)nodes.item(i)));
}
private document parsexmlfile() {
file xmlfile = new file(booksxmlfile);
if(!xmlfile.exists()) {
throw new runtimeexception("cannot find xml file: " + booksxmlfile);
try {
documentbuilder builder = factory.newdocumentbuilder();
return builder.parse(xmlfile);
} catch(exception ex) {
throw new runtimeexception("failed to create documentbuilder instance", ex);
private book parsebookelement(element bookelement) {
string category = bookelement.getattribute("category");
string cover = bookelement.getattribute("cover");
nodelist nodes = bookelement.getelementsbytagname("title");
string lang = ((element)nodes.item(0)).getattribute("lang");
// first way to get content of an element
string title = ((text)((element)nodes.item(0)).getfirstchild()).getdata().trim();
list<string> authors = new arraylist<string>();
nodes = bookelement.getelementsbytagname("author");
// second way to get content of an element
string author = nodes.item(0).gettextcontent().trim();
authors.add(author);
nodes = bookelement.getelementsbytagname("year");
int year = integer.parseint(nodes.item(0).gettextcontent().trim());
nodes = bookelement.getelementsbytagname("price");
double price = double.parsedouble(nodes.item(0).gettextcontent().trim());
book.setcategory(category);
book.setcover(cover);
book.settitle(new titleinfo(title, lang));
book.setauthors(authors);
book.setyear(year);
book.setprice(price);
public string getbooksxmlfile() {
return booksxmlfile;
public static void main(string[] args) {
w3cbooksdomreader reader = new w3cbooksdomreader("resources/xmlfiles/w3c_books.xml");
list<book> books = reader.parse();
system.out.println("result:");
for(book book : books) {
system.out.println(book);
将對象執行個體序列化成xml檔案,首先需要建構dom樹,即要建構document執行個體,然後将該document執行個體寫入的xml檔案中。如上節所述,可以使用documentbuilder類來建立document執行個體,然後根據對象執行個體(book執行個體)和需要的xml格式建構節點和節點的排布即可,這裡不再詳述。
要将對象序列化成xml檔案還要處理的另一個問題是如何将document執行個體寫入到指定的xml檔案中,在java中提供了transformer接口來做這件事情。這屬于xlst(extensible stylesheet language)的範疇,不過這裡不打算對其做詳細介紹,主要關注如何将document執行個體輸出成xml檔案。
transformer提供了transform方法将document執行個體寫入指定的流中:
public abstract void transform(source xmlsource, result outputtarget);
其中source接口定義了輸入源,它可以是domsource,也可以是saxsource,或者是自定義的其他source子類,這裡主要介紹domsource。source接口定義了systemid屬性,它表示xml源的位置,xml源不是從url中擷取的源來說,它為null。具體定義如下:
public interface source {
public void setsystemid(string systemid);
public string getsystemid();
domsource是對source的一個具體實作,它接收node、systemid資訊:
public class domsource implements source {
private node node;
private string systemid;
public domsource() { }
public domsource(node n) {
setnode(n);
public domsource(node node, string systemid) {
setnode(node);
setsystemid(systemid);
result是對輸出目的的抽象,即将輸入源轉換成目的源。同source接口,result接口也定義了systemid屬性,表示目的檔案位置,如果目的源不是url,則改值為null:
public interface result {
jdk中提供了多種result的實作,如domresult、streamresult等。這裡隻介紹streamresult,表示其輸出目的是流,我們可以提供writer、outputstream等執行個體來接收這些輸出:
public class streamresult implements result {
public streamresult() {
public streamresult(outputstream outputstream) {
setoutputstream(outputstream);
public streamresult(writer writer) {
setwriter(writer);
public streamresult(string systemid) {
this.systemid = systemid;
public streamresult(file f) {
setsystemid(f.touri().toasciistring());
private string systemid;
private outputstream outputstream;
private writer writer;
除了transform方法,transformer類還提供了其他的方法用于配置transformer在轉換時用到的資訊(隻提供接口定義,不詳述):
public abstract void setparameter(string name, object value);
public abstract object getparameter(string name);
public abstract void clearparameters();
public abstract void seturiresolver(uriresolver resolver);
public abstract uriresolver geturiresolver();
public abstract void setoutputproperties(properties oformat);
public abstract properties getoutputproperties();
public abstract void setoutputproperty(string name, string value);
public abstract string getoutputproperty(string name);
public abstract void seterrorlistener(errorlistener listener);
public abstract errorlistener geterrorlistener();
類似documentbuilder,transformer通過transformerfactory建立,而transformerfactory的建立如同documentbuilderfactory的建立以及查找機制,所不同的是transformerfactory的屬性名為:javax.xml.transform.transformerfactory,其預設實作類為:com.sun.org.apache.xalan.internal.xsltc.trax.transformerfactoryimpl,而且它也提供了兩個擷取transformerfactory執行個體的方法,這裡不再詳述:
public static transformerfactory newinstance();
public static transformerfactory newinstance(string factoryclassname, classloader classloader);
transformerfactory提供了建立transformer和templates的方法,同時也提供了在建立這兩個執行個體時可以設定的一些配置方法:
public abstract transformer newtransformer(source source);
public abstract transformer newtransformer();
public abstract templates newtemplates(source source);
public abstract source getassociatedstylesheet(source source, string media,
string title, string charset);
最後,提供一個完整的例子,使用本文開始時建立的list<book>執行個體序列化成xml檔案:
public class w3cbooksdomwriter {
private static documentbuilder docbuilder;
private static transformer transformer;
static {
documentbuilderfactory factory = documentbuilderfactory.newinstance();
docbuilder = factory.newdocumentbuilder();
throw new runtimeexception("create documentbuilder instance failed.", ex);
transformerfactory transfactory = transformerfactory.newinstance();
transformer = transfactory.newtransformer();
throw new runtimeexception("create transformer instance failed.", ex);
transformer.setoutputproperty(outputkeys.encoding, "utf-8");
transformer.setoutputproperty(outputkeys.indent, "yes");
private list<book> books;
public w3cbooksdomwriter(list<book> books) {
this.books = books;
public void toxml(writer writer) throws exception {
document doc = builddomtree();
writetoxmlfile(writer, doc);
public document builddomtree() {
document doc = docbuilder.newdocument();
element root = doc.createelement("bookstore");
doc.appendchild(root);
for(book book : books) {
element bookelement = buildbookelement(doc, book);
root.appendchild(bookelement);
return doc;
public element buildbookelement(document doc, book book) {
element bookelement = doc.createelement("book");
bookelement.setattribute("category", book.getcategory());
bookelement.setattribute("cover", book.getcover());
titleinfo title = book.gettitle();
element titleelement = doc.createelement("title");
titleelement.setattribute("lang", title.getlang());
titleelement.settextcontent(title.gettitle());
bookelement.appendchild(titleelement);
for(string author : book.getauthors()) {
element authorelement = doc.createelement("author");
authorelement.settextcontent(author);
bookelement.appendchild(authorelement);
element yearelement = doc.createelement("year");
yearelement.settextcontent(string.valueof(book.getyear()));
bookelement.appendchild(yearelement);
element priceelement = doc.createelement("price");
priceelement.settextcontent(string.valueof(book.getprice()));
bookelement.appendchild(priceelement);
return bookelement;
public void writetoxmlfile(writer writer, document doc) throws exception {
domsource source = new domsource(doc);
streamresult result = new streamresult(writer);
transformer.transform(source, result);
public static void main(string[] args) throws exception {
stringwriter writer = new stringwriter();
list<book> books = w3cbooksbuilder.buildbooks();
w3cbooksdomwriter domwriter = new w3cbooksdomwriter(books);
domwriter.toxml(writer);
system.out.println(writer.tostring());