Java 解析 XML
标簽: Java基礎
XML解析技術有兩種
DOM
SAX
-
DOM方式
根據XML的層級結構在記憶體中配置設定一個樹形結構,把XML的标簽,屬性和文本等元素都封裝成樹的節點對象
- 優點: 便于實作
增
删
改
查
- 缺點: XML檔案過大可能造成記憶體溢出
- 優點: 便于實作
-
SAX方式
采用事件驅動模型邊讀邊解析:從上到下一行行解析,解析到某一進制素, 調用相應解析方法
- 優點: 不會造成記憶體溢出,
- 缺點: 查詢不友善,但不能實作
增
删
改
不同的公司群組織提供了針對DOM和SAX兩種方式的解析器
- SUN的
jaxp
- Dom4j組織的
(最常用:如Spring)dom4j
- JDom組織的
關于這三種解析器淵源可以參考java解析xml檔案四種方式.jdom
JAXP 解析
JAXP是JavaSE的一部分,在
javax.xml.parsers
包下,分别針對dom與sax提供了如下解析器:
- Dom
-
DocumentBuilder
-
DocumentBuilderFactory
-
- SAX
-
SAXParser
-
SAXParserFactory
-
示例XML如下,下面我們會使用JAXP對他進行
增
删
改
查
操作
- config.xml
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE beans SYSTEM "constraint.dtd">
<beans>
<bean id="id1" class="com.fq.domain.Bean">
<property name="isUsed" value="true"/>
</bean>
<bean id="id2" class="com.fq.domain.ComplexBean">
<property name="refBean" ref="id1"/>
</bean>
</beans>
- constraint.dtd
<!ELEMENT beans (bean*) >
<!ELEMENT bean (property*)>
<!ATTLIST bean
id CDATA #REQUIRED
class CDATA #REQUIRED
>
<!ELEMENT property EMPTY>
<!ATTLIST property
name CDATA #REQUIRED
value CDATA #IMPLIED
ref CDATA #IMPLIED>
JAXP-Dom
/**
* @author jifang
* @since 16/1/13下午11:24.
*/
public class XmlRead {
@Test
public void client() throws ParserConfigurationException, IOException, SAXException {
// 生成一個Dom解析器
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
// 解析XML檔案
Document document = builder.parse(ClassLoader.getSystemResourceAsStream("config.xml"));
// ...
}
}
DocumentBuilder
的
parse(String/File/InputSource/InputStream param)
方法可以将一個XML檔案解析為一個
Document
對象,代表整個文檔.
Document
(
org.w3c.dom
包下)是一個接口,其父接口為
Node
,
Node
的其他子接口還有
Element
Attr
Text
等.
-
Node
常用方法 | 釋義 |
---|---|
| Adds the node newChild to the end of the list of children of this node. |
| Removes the child node indicated by oldChild from the list of children, and returns it. |
| A NodeList that contains all children of this node. |
| A NamedNodeMap containing the attributes of this node (if it is an Element) or null otherwise. |
| This attribute returns the text content of this node and its descendants. |
-
Document
常用方法 | 釋義 |
---|---|
| Returns a NodeList of all the Elements in document order with a given tag name and are contained in the document. |
| Creates an element of the type specified. |
| Creates a Text node given the specified string. |
| Creates an Attr of the given name. |
Dom查詢
- 解析
标簽上的所有屬性<bean/>
public class XmlRead {
private Document document;
@Before
public void setUp() throws ParserConfigurationException, IOException, SAXException {
document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(ClassLoader.getSystemResourceAsStream("config.xml"));
}
@Test
public void client() throws ParserConfigurationException, IOException, SAXException {
NodeList beans = document.getElementsByTagName("bean");
for (int i = ; i < beans.getLength(); ++i) {
NamedNodeMap attributes = beans.item(i).getAttributes();
scanNameNodeMap(attributes);
}
}
private void scanNameNodeMap(NamedNodeMap attributes) {
for (int i = ; i < attributes.getLength(); ++i) {
Attr attribute = (Attr) attributes.item(i);
System.out.printf("%s -> %s%n", attribute.getName(), attribute.getValue());
// System.out.println(attribute.getNodeName() + " -> " + attribute.getTextContent());
}
}
}
- 列印XML檔案所有标簽名
@Test
public void client() {
list(document, );
}
private void list(Node node, int depth) {
if (node.getNodeType() == Node.ELEMENT_NODE) {
for (int i = ; i < depth; ++i)
System.out.print("\t");
System.out.println("<" + node.getNodeName() + ">");
}
NodeList childNodes = node.getChildNodes();
for (int i = ; i < childNodes.getLength(); ++i) {
list(childNodes.item(i), depth + );
}
}
Dom添加節點
- 在第一個
标簽下添加一個<bean/>
标簽,最終結果形式:<property/>
<bean id="id1" class="com.fq.domain.Bean">
<property name="isUsed" value="true"/>
<property name="name" value="simple-bean">新添加的</property>
</bean>
/**
* @author jifang
* @since 16/1/17 下午5:56.
*/
public class XmlAppend {
// 文檔回寫器
private Transformer transformer;
// xml文檔
private Document document;
@Before
public void setUp() throws ParserConfigurationException, IOException, SAXException {
document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(ClassLoader.getSystemResourceAsStream("config.xml"));
}
@Test
public void client() {
// 得到第一bean标簽
Node firstBean = document.getElementsByTagName("bean").item();
/** 建立一個property标簽 **/
Element property = document.createElement("property");
// 為property标簽添加屬性
// property.setAttribute("name", "name");
// property.setAttribute("value", "feiqing");
Attr name = document.createAttribute("name");
name.setValue("name");
property.setAttributeNode(name);
Attr value = document.createAttribute("value");
value.setValue("simple-bean");
property.setAttributeNode(value);
// 為property标簽添加内容
//property.setTextContent("新添加的");
property.appendChild(document.createTextNode("新添加的"));
// 将property标簽添加到bean标簽下
firstBean.appendChild(property);
}
@After
public void tearDown() throws TransformerException {
transformer = TransformerFactory.newInstance().newTransformer();
// 寫回XML
transformer.transform(new DOMSource(document),
new StreamResult("src/main/resources/config.xml"));
}
}
注意: 必須将記憶體中的DOM寫回XML文檔才能生效
Dom更新節點
- 将剛剛添加的
修改如下<property/>
<property name="name" value="new-simple-bean">simple-bean是新添加的</property>
@Test
public void client() {
NodeList properties = document.getElementsByTagName("property");
for (int i = ; i < properties.getLength(); ++i) {
Element property = (Element) properties.item(i);
if (property.getAttribute("value").equals("simple-bean")) {
property.setAttribute("value", "new-simple-bean");
property.setTextContent("simple-bean是新添加的");
break;
}
}
}
Dom删除節點
删除剛剛修改的
<property/>
标簽
@Test
public void client() {
NodeList properties = document.getElementsByTagName("property");
for (int i = ; i < properties.getLength(); ++i) {
Element property = (Element) properties.item(i);
if (property.getAttribute("value").equals("new-simple-bean")) {
property.getParentNode().removeChild(property);
break;
}
}
}
JAXP-SAX
SAXParser
執行個體需要從
SAXParserFactory
執行個體的
newSAXParser()
方法獲得, 用于解析XML檔案的
parse(String uri, DefaultHandler dh)
方法沒有傳回值,但比DOM方法多了一個事件處理器參數
DefaultHandler
:
- 解析到開始标簽,自動調用
的DefaultHandler
方法;startElement()
- 解析到标簽内容(文本),自動調用
的DefaultHandler
方法;characters()
- 解析到結束标簽,自動調用
的DefaultHandler
方法.endElement()
Sax查詢
- 列印整個XML文檔
/**
* @author jifang
* @since 16/1/17 下午9:16.
*/
public class SaxRead {
@Test
public void client() throws ParserConfigurationException, IOException, SAXException {
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
parser.parse(ClassLoader.getSystemResourceAsStream("config.xml"), new SaxHandler());
}
private class SaxHandler extends DefaultHandler {
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.print("<" + qName);
for (int i = ; i < attributes.getLength(); ++i) {
String attrName = attributes.getQName(i);
String attrValue = attributes.getValue(i);
System.out.print(" " + attrName + "=" + attrValue);
}
System.out.print(">");
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
System.out.print(new String(ch, start, length));
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.print("</" + qName + ">");
}
}
}
- 列印所有
标簽内容的property
Handler
private class SaxHandler extends DefaultHandler {
// 用互斥鎖保護isProperty變量
private boolean isProperty = false;
private Lock mutex = new ReentrantLock();
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equals("property")) {
mutex.lock();
isProperty = true;
}
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
// 隻有被鎖定之後才有可能是true
if (isProperty) {
System.out.println(new String(ch, start, length));
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("property")) {
try {
isProperty = false;
} finally {
mutex.unlock();
}
}
}
}
注: SAX方式不能實作
增
删
操作.
改
Dom4j解析
Dom4j是JDom的一種智能分支,從原先的JDom組織中分離出來,提供了比JDom功能更加強大,性能更加卓越的Dom4j解析器(比如提供對XPath支援).
使用Dom4j需要在pom中添加如下依賴:
<dependency>
<groupId>dom4j</groupId>
<artifactId>dom4j</artifactId>
<version>1.6.1</version>
</dependency>
示例XML如下,下面我們會使用Dom4j對他進行
增
删
改
查
操作:
- config.xml
<?xml version="1.0" encoding="utf-8"?>
<beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.fq.me/context"
xsi:schemaLocation="http://www.fq.me/context http://www.fq.me/context/context.xsd">
<bean id="id1" class="com.fq.benz">
<property name="name" value="benz"/>
</bean>
<bean id="id2" class="com.fq.domain.Bean">
<property name="isUsed" value="true"/>
<property name="complexBean" ref="id1"/>
</bean>
</beans>
- context.xsd
<?xml version="1.0" encoding="utf-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.fq.me/context"
elementFormDefault="qualified">
<element name="beans">
<complexType>
<sequence>
<element name="bean" maxOccurs="unbounded">
<complexType>
<sequence>
<element name="property" maxOccurs="unbounded">
<complexType>
<attribute name="name" type="string" use="required"/>
<attribute name="value" type="string" use="optional"/>
<attribute name="ref" type="string" use="optional"/>
</complexType>
</element>
</sequence>
<attribute name="id" type="string" use="required"/>
<attribute name="class" type="string" use="required"/>
</complexType>
</element>
</sequence>
</complexType>
</element>
</schema>
/**
* @author jifang
* @since 16/1/18下午4:02.
*/
public class Dom4jRead {
@Test
public void client() throws DocumentException {
SAXReader reader = new SAXReader();
Document document = reader.read(ClassLoader.getSystemResource("config.xml"));
// ...
}
}
與JAXP類似
Document
也是一個接口(
org.dom4j
包下),其父接口是
Node
,
Node
的子接口還有
Element
Attribute
Document
Text
CDATA
Branch
等
-
Node
常用方法 | 釋義 |
---|---|
| getParent returns the parent Element if this node supports the parent relationship or null if it is the root element or does not support the parent relationship. |
-
Document
常用方法 | 釋義 |
---|---|
| Returns the root Elementfor this document. |
-
Element
常用方法 | 釋義 |
---|---|
| Adds the given Attribute/Text to this element. |
| Adds the attribute value of the given local name. |
| Returns the attribute at the specified indexGets the |
| Returns the attribute with the given name |
| Returns the first element for the given local name and any namespace. |
| Returns an iterator over all this elements child elements. |
| Returns an iterator over the elements contained in this element which match the given local name and any namespace. |
| Returns the elements contained in this element. |
| Returns the elements contained in this element with the given local name and any namespace. |
-
Branch
常用方法 | 釋義 |
---|---|
| Adds a new Element node with the given name to this branch and returns a reference to the new node. |
| Removes the given Node if the node is an immediate child of this branch. |
Dom4j查詢
- 列印所有屬性資訊:
/**
* @author jifang
* @since 16/1/18下午4:02.
*/
public class Dom4jRead {
private Document document;
@Before
public void setUp() throws DocumentException {
document = new SAXReader()
.read(ClassLoader.getSystemResource("config.xml"));
}
@Test
@SuppressWarnings("unchecked")
public void client() {
Element beans = document.getRootElement();
for (Iterator iterator = beans.elementIterator(); iterator.hasNext(); ) {
Element bean = (Element) iterator.next();
String id = bean.attributeValue("id");
String clazz = bean.attributeValue("class");
System.out.println("id: " + id + ", class: " + clazz);
scanProperties(bean.elements());
}
}
public void scanProperties(List<? extends Element> properties) {
for (Element property : properties) {
System.out.print("name: " + property.attributeValue("name"));
Attribute value = property.attribute("value");
if (value != null) {
System.out.println("," + value.getName() + ": " + value.getValue());
}
Attribute ref = property.attribute("ref");
if (ref != null) {
System.out.println("," + ref.getName() + ": " + ref.getValue());
}
}
}
}
Dom4j添加節點
在第一個
<bean/>
标簽末尾添加
<property/>
标簽
<bean id="id1" class="com.fq.benz">
<property name="name" value="benz"/>
<property name="refBean" ref="id2">新添加的标簽</property>
</bean>
/**
* @author jifang
* @since 16/1/19上午9:50.
*/
public class Dom4jAppend {
//...
@Test
public void client() {
Element beans = document.getRootElement();
Element firstBean = beans.element("bean");
Element property = firstBean.addElement("property");
property.addAttribute("name", "refBean");
property.addAttribute("ref", "id2");
property.setText("新添加的标簽");
}
@After
public void tearDown() throws IOException {
// 回寫XML
OutputFormat format = OutputFormat.createPrettyPrint();
XMLWriter writer = new XMLWriter(new FileOutputStream("src/main/resources/config.xml"), format);
writer.write(document);
}
}
我們可以将擷取讀寫XML操作封裝成一個工具, 以後調用時會友善些:
/**
* @author jifang
* @since 16/1/19下午2:12.
*/
public class XmlUtils {
public static Document getXmlDocument(String config) {
try {
return new SAXReader().read(ClassLoader.getSystemResource(config));
} catch (DocumentException e) {
throw new RuntimeException(e);
}
}
public static void writeXmlDocument(String path, Document document) {
try {
new XMLWriter(new FileOutputStream(path), OutputFormat.createPrettyPrint()).write(document);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
- 在第一個
的第一個<bean/>
後面添加一個<property/>
标簽<property/>
<bean id="id1" class="com.fq.benz">
<property name="name" value="benz"/>
<property name="rate" value="3.14"/>
<property name="refBean" ref="id2">新添加的标簽</property>
</bean>
public class Dom4jAppend {
private Document document;
@Before
public void setUp() {
document = XmlUtils.getXmlDocument("config.xml");
}
@Test
@SuppressWarnings("unchecked")
public void client() {
Element beans = document.getRootElement();
Element firstBean = beans.element("bean");
List<Element> properties = firstBean.elements();
//Element property = DocumentHelper
// .createElement(QName.get("property", firstBean.getNamespaceURI()));
Element property = DocumentFactory.getInstance()
.createElement("property", firstBean.getNamespaceURI());
property.addAttribute("name", "rate");
property.addAttribute("value", "3.14");
properties.add(, property);
}
@After
public void tearDown() {
XmlUtils.writeXmlDocument("src/main/resources/config.xml", document);
}
}
Dom4j修改節點
- 将
id1
的第一個bean
修改如下:<property/>
<property name="name" value="翡青"/>
@Test
@SuppressWarnings("unchecked")
public void client() {
Element beans = document.getRootElement();
Element firstBean = beans.element("bean");
List<Element> properties = firstBean.elements();
Element property = DocumentFactory.getInstance()
.createElement("property", firstBean.getNamespaceURI());
property.addAttribute("name", "rate");
property.addAttribute("value", "3.14");
properties.add(, property);
}
Dom4j 删除節點
- 删除剛剛修改的節點
@Test
@SuppressWarnings("unchecked")
public void delete() {
List<Element> beans = document.getRootElement().elements("bean");
for (Element bean : beans) {
if (bean.attributeValue("id").equals("id1")) {
List<Element> properties = bean.elements("property");
for (Element property : properties) {
if (property.attributeValue("name").equals("name")) {
// 執行删除動作
property.getParent().remove(property);
break;
}
}
break;
}
}
}
Dom4j執行個體
在Java 反射一文中我們實作了根據
JSON
配置檔案來加載
bean
的對象池,現在我們可以為其添加根據XML配置(XML檔案同前):
/**
* @author jifang
* @since 16/1/18下午9:18.
*/
public class XmlParse {
private static final ObjectPool POOL = ObjectPoolBuilder.init(null);
public static Element parseBeans(String config) {
try {
return new SAXReader().read(ClassLoader.getSystemResource(config)).getRootElement();
} catch (DocumentException e) {
throw new RuntimeException(e);
}
}
public static void processObject(Element bean, List<? extends Element> properties)
throws ClassNotFoundException, IllegalAccessException, InstantiationException, NoSuchFieldException {
Class<?> clazz = Class.forName(bean.attributeValue(CommonConstant.CLASS));
Object targetObject = clazz.newInstance();
for (Element property : properties) {
String fieldName = property.attributeValue(CommonConstant.NAME);
Field field = clazz.getDeclaredField(fieldName);
field.setAccessible(true);
// 含有value屬性
if (property.attributeValue(CommonConstant.VALUE) != null) {
SimpleValueSetUtils.setSimpleValue(field, targetObject, property.attributeValue(CommonConstant.VALUE));
} else if (property.attributeValue(CommonConstant.REF) != null) {
String refId = property.attributeValue(CommonConstant.REF);
Object object = POOL.getObject(refId);
field.set(targetObject, object);
} else {
throw new RuntimeException("neither value nor ref");
}
}
POOL.putObject(bean.attributeValue(CommonConstant.ID), targetObject);
}
}
注: 上面代碼隻是對象池項目的XML解析部分,完整項目可參考[email protected]:feiqing/commons-frame.git
XPath
XPath是一門在XML文檔中查找資訊的語言,XPath可用來在XML文檔中對元素和屬性進行周遊.
表達式 | 描述 |
---|---|
| 從根節點開始擷取( :比對根下的 ; :比對 下面的 ) |
| 從目前文檔中搜尋,而不用考慮它們的位置( : 比對目前文檔中所有 ) |
| 比對任何元素節點( : 比對所有标簽) |
| 比對屬性(例: : 比對所有 屬性) |
| 位置謂語比對(例: : 比對第一個 ; : 比對最後一個 ) |
| 屬性謂語比對(例: : 比對所有帶id屬性的标簽; : 比對所有id屬性值為’id1’的标簽) |
謂語: 謂語用來查找某個特定的節點或者包含某個指定的值的節點.
XPath的文法詳細内容可以參考W3School XPath 教程.
Dom4j對XPath的支援
預設的情況下Dom4j并不支援XPath, 需要在pom下添加如下依賴:
<dependency>
<groupId>jaxen</groupId>
<artifactId>jaxen</artifactId>
<version>1.1.6</version>
</dependency>
Dom4j
Node
接口提供了方法對XPath支援:
方法 |
---|
|
|
|
|
|
XPath實作查詢
- 查詢所有
标簽上的屬性值bean
/**
* @author jifang
* @since 16/1/20上午9:28.
*/
public class XPathRead {
private Document document;
@Before
public void setUp() throws DocumentException {
document = XmlUtils.getXmlDocument("config.xml");
}
@Test
@SuppressWarnings("unchecked")
public void client() {
List<Element> beans = document.selectNodes("//bean");
for (Element bean : beans) {
System.out.println("id: " + bean.attributeValue("id") +
", class: " + bean.attributeValue("class"));
}
}
}
XPath實作更新
- 删除id=”id2”的
<bean/>
@Test
public void client() {
Node bean = document.selectSingleNode("//bean[@id=\"id2\"]");
bean.getParent().remove(bean);
}
- 參考:
- Dom4j的使用
- Java 處理 XML 的三種主流技術及介紹