天天看點

Java 解析 XML Java 解析 XML

Java 解析 XML

标簽: Java基礎

XML解析技術有兩種

DOM

SAX

  • DOM方式

    根據XML的層級結構在記憶體中配置設定一個樹形結構,把XML的标簽,屬性和文本等元素都封裝成樹的節點對象

    • 優點: 便于實作

    • 缺點: XML檔案過大可能造成記憶體溢出
  • SAX方式

    采用事件驅動模型邊讀邊解析:從上到下一行行解析,解析到某一進制素, 調用相應解析方法

    • 優點: 不會造成記憶體溢出,
    • 缺點: 查詢不友善,但不能實作

不同的公司群組織提供了針對DOM和SAX兩種方式的解析器

  • SUN的

    jaxp

  • Dom4j組織的

    dom4j

    (最常用:如Spring)
  • JDom組織的

    jdom

    關于這三種解析器淵源可以參考java解析xml檔案四種方式.

JAXP 解析

JAXP是JavaSE的一部分,在

javax.xml.parsers

包下,分别針對dom與sax提供了如下解析器:

  • Dom
    • DocumentBuilder

    • DocumentBuilderFactory

  • SAX
    • SAXParser

    • SAXParserFactory

示例XML如下,下面我們會使用JAXP對他進行

操作

  • config.xml
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE beans SYSTEM "constraint.dtd">
<beans>
    <bean id="id1" class="com.fq.domain.Bean">
        <property name="isUsed" value="true"/>
    </bean>
    <bean id="id2" class="com.fq.domain.ComplexBean">
        <property name="refBean" ref="id1"/>
    </bean>
</beans>
           
  • constraint.dtd
<!ELEMENT beans (bean*) >
        <!ELEMENT bean (property*)>
        <!ATTLIST bean
                id CDATA #REQUIRED
                class CDATA #REQUIRED
                >

        <!ELEMENT property EMPTY>
        <!ATTLIST property
                name CDATA #REQUIRED
                value CDATA #IMPLIED
                ref CDATA #IMPLIED>
           

JAXP-Dom

/**
 * @author jifang
 * @since 16/1/13下午11:24.
 */
public class XmlRead {

    @Test
    public void client() throws ParserConfigurationException, IOException, SAXException {
        // 生成一個Dom解析器
        DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();

        // 解析XML檔案
        Document document = builder.parse(ClassLoader.getSystemResourceAsStream("config.xml"));

        // ...
    }
}
           

DocumentBuilder

parse(String/File/InputSource/InputStream param)

方法可以将一個XML檔案解析為一個

Document

對象,代表整個文檔.

Document

(

org.w3c.dom

包下)是一個接口,其父接口為

Node

,

Node

的其他子接口還有

Element

Attr

Text

等.

  • Node

Node

常用方法
釋義

Node appendChild(Node newChild)

Adds the node newChild to the end of the list of children of this node.

Node removeChild(Node oldChild)

Removes the child node indicated by oldChild from the list of children, and returns it.

NodeList getChildNodes()

A NodeList that contains all children of this node.

NamedNodeMap getAttributes()

A NamedNodeMap containing the attributes of this node (if it is an Element) or null otherwise.

String getTextContent()

This attribute returns the text content of this node and its descendants.
  • Document

Document

常用方法
釋義

NodeList getElementsByTagName(String tagname)

Returns a NodeList of all the Elements in document order with a given tag name and are contained in the document.

Element createElement(String tagName)

Creates an element of the type specified.

Text createTextNode(String data)

Creates a Text node given the specified string.

Attr createAttribute(String name)

Creates an Attr of the given name.

Dom查詢

  • 解析

    <bean/>

    标簽上的所有屬性
public class XmlRead {

    private Document document;

    @Before
    public void setUp() throws ParserConfigurationException, IOException, SAXException {
        document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                .parse(ClassLoader.getSystemResourceAsStream("config.xml"));
    }

    @Test
    public void client() throws ParserConfigurationException, IOException, SAXException {
        NodeList beans = document.getElementsByTagName("bean");
        for (int i = ; i < beans.getLength(); ++i) {
            NamedNodeMap attributes = beans.item(i).getAttributes();
            scanNameNodeMap(attributes);
        }
    }

    private void scanNameNodeMap(NamedNodeMap attributes) {
        for (int i = ; i < attributes.getLength(); ++i) {
            Attr attribute = (Attr) attributes.item(i);
            System.out.printf("%s -> %s%n", attribute.getName(), attribute.getValue());
            // System.out.println(attribute.getNodeName() + " -> " + attribute.getTextContent());
        }
    }
}
           
  • 列印XML檔案所有标簽名
@Test
public void client() {
    list(document, );
}

private void list(Node node, int depth) {
    if (node.getNodeType() == Node.ELEMENT_NODE) {
        for (int i = ; i < depth; ++i)
            System.out.print("\t");
        System.out.println("<" + node.getNodeName() + ">");
    }

    NodeList childNodes = node.getChildNodes();
    for (int i = ; i < childNodes.getLength(); ++i) {
        list(childNodes.item(i), depth + );
    }
}
           

Dom添加節點

  • 在第一個

    <bean/>

    标簽下添加一個

    <property/>

    标簽,最終結果形式:
<bean id="id1" class="com.fq.domain.Bean">
    <property name="isUsed" value="true"/>
    <property name="name" value="simple-bean">新添加的</property>
</bean>
           
/**
 * @author jifang
 * @since 16/1/17 下午5:56.
 */
public class XmlAppend {

    // 文檔回寫器
    private Transformer transformer;

    // xml文檔
    private Document document;

    @Before
    public void setUp() throws ParserConfigurationException, IOException, SAXException {
        document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                .parse(ClassLoader.getSystemResourceAsStream("config.xml"));
    }

    @Test
    public void client() {
        // 得到第一bean标簽
        Node firstBean = document.getElementsByTagName("bean").item();

        /** 建立一個property标簽 **/
        Element property = document.createElement("property");

        // 為property标簽添加屬性
        // property.setAttribute("name", "name");
        // property.setAttribute("value", "feiqing");
        Attr name = document.createAttribute("name");
        name.setValue("name");
        property.setAttributeNode(name);
        Attr value = document.createAttribute("value");
        value.setValue("simple-bean");
        property.setAttributeNode(value);

        // 為property标簽添加内容
        //property.setTextContent("新添加的");
        property.appendChild(document.createTextNode("新添加的"));

        // 将property标簽添加到bean标簽下
        firstBean.appendChild(property);
    }

    @After
    public void tearDown() throws TransformerException {
        transformer = TransformerFactory.newInstance().newTransformer();

        // 寫回XML
        transformer.transform(new DOMSource(document),
                new StreamResult("src/main/resources/config.xml"));
    }
}
           
注意: 必須将記憶體中的DOM寫回XML文檔才能生效

Dom更新節點

  • 将剛剛添加的

    <property/>

    修改如下
<property name="name" value="new-simple-bean">simple-bean是新添加的</property>
           
@Test
public void client() {
    NodeList properties = document.getElementsByTagName("property");
    for (int i = ; i < properties.getLength(); ++i) {
        Element property = (Element) properties.item(i);
        if (property.getAttribute("value").equals("simple-bean")) {
            property.setAttribute("value", "new-simple-bean");
            property.setTextContent("simple-bean是新添加的");
            break;
        }
    }
}
           

Dom删除節點

删除剛剛修改的

<property/>

标簽

@Test
public void client() {
    NodeList properties = document.getElementsByTagName("property");
    for (int i = ; i < properties.getLength(); ++i) {
        Element property = (Element) properties.item(i);
        if (property.getAttribute("value").equals("new-simple-bean")) {
            property.getParentNode().removeChild(property);
            break;
        }
    }
}
           

JAXP-SAX

SAXParser

執行個體需要從

SAXParserFactory

執行個體的

newSAXParser()

方法獲得, 用于解析XML檔案的

parse(String uri, DefaultHandler dh)

方法沒有傳回值,但比DOM方法多了一個事件處理器參數

DefaultHandler

:

  • 解析到開始标簽,自動調用

    DefaultHandler

    startElement()

    方法;
  • 解析到标簽内容(文本),自動調用

    DefaultHandler

    characters()

    方法;
  • 解析到結束标簽,自動調用

    DefaultHandler

    endElement()

    方法.

Sax查詢

  • 列印整個XML文檔
/**
 * @author jifang
 * @since 16/1/17 下午9:16.
 */
public class SaxRead {

    @Test
    public void client() throws ParserConfigurationException, IOException, SAXException {
        SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
        parser.parse(ClassLoader.getSystemResourceAsStream("config.xml"), new SaxHandler());
    }

    private class SaxHandler extends DefaultHandler {

        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            System.out.print("<" + qName);
            for (int i = ; i < attributes.getLength(); ++i) {
                String attrName = attributes.getQName(i);
                String attrValue = attributes.getValue(i);
                System.out.print(" " + attrName + "=" + attrValue);
            }
            System.out.print(">");
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            System.out.print(new String(ch, start, length));
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            System.out.print("</" + qName + ">");
        }
    }
}
           
  • 列印所有

    property

    标簽内容的

    Handler

private class SaxHandler extends DefaultHandler {
    // 用互斥鎖保護isProperty變量
    private boolean isProperty = false;
    private Lock mutex = new ReentrantLock();

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        if (qName.equals("property")) {
            mutex.lock();
            isProperty = true;
        }
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        // 隻有被鎖定之後才有可能是true
        if (isProperty) {
            System.out.println(new String(ch, start, length));
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (qName.equals("property")) {
            try {
                isProperty = false;
            } finally {
                mutex.unlock();
            }
        }
    }
}
           
注: SAX方式不能實作

操作.

Dom4j解析

Dom4j是JDom的一種智能分支,從原先的JDom組織中分離出來,提供了比JDom功能更加強大,性能更加卓越的Dom4j解析器(比如提供對XPath支援).

使用Dom4j需要在pom中添加如下依賴:

<dependency>
    <groupId>dom4j</groupId>
    <artifactId>dom4j</artifactId>
    <version>1.6.1</version>
</dependency>
           

示例XML如下,下面我們會使用Dom4j對他進行

操作:

  • config.xml
<?xml version="1.0" encoding="utf-8"?>
<beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns="http://www.fq.me/context"
       xsi:schemaLocation="http://www.fq.me/context http://www.fq.me/context/context.xsd">
    <bean id="id1" class="com.fq.benz">
        <property name="name" value="benz"/>
    </bean>
    <bean id="id2" class="com.fq.domain.Bean">
        <property name="isUsed" value="true"/>
        <property name="complexBean" ref="id1"/>
    </bean>
</beans>
           
  • context.xsd
<?xml version="1.0" encoding="utf-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.fq.me/context"
        elementFormDefault="qualified">
    <element name="beans">
        <complexType>
            <sequence>
                <element name="bean" maxOccurs="unbounded">
                    <complexType>
                        <sequence>
                            <element name="property" maxOccurs="unbounded">
                                <complexType>
                                    <attribute name="name" type="string" use="required"/>
                                    <attribute name="value" type="string" use="optional"/>
                                    <attribute name="ref" type="string" use="optional"/>
                                </complexType>
                            </element>
                        </sequence>
                        <attribute name="id" type="string" use="required"/>
                        <attribute name="class" type="string" use="required"/>
                    </complexType>
                </element>
            </sequence>
        </complexType>
    </element>
</schema>
           
/**
 * @author jifang
 * @since 16/1/18下午4:02.
 */
public class Dom4jRead {

    @Test
    public void client() throws DocumentException {
        SAXReader reader = new SAXReader();
        Document document = reader.read(ClassLoader.getSystemResource("config.xml"));
        // ...
    }
}
           

與JAXP類似

Document

也是一個接口(

org.dom4j

包下),其父接口是

Node

,

Node

的子接口還有

Element

Attribute

Document

Text

CDATA

Branch

  • Node

Node

常用方法
釋義

Element getParent()

getParent returns the parent Element if this node supports the parent relationship or null if it is the root element or does not support the parent relationship.
  • Document

Document

常用方法
釋義

Element getRootElement()

Returns the root Elementfor this document.
  • Element

Element

常用方法
釋義

void add(Attribute/Text param)

Adds the given Attribute/Text to this element.

Element addAttribute(String name, String value)

Adds the attribute value of the given local name.

Attribute attribute(int index)

Returns the attribute at the specified indexGets the

Attribute attribute(String name)

Returns the attribute with the given name

Element element(String name)

Returns the first element for the given local name and any namespace.

Iterator elementIterator()

Returns an iterator over all this elements child elements.

Iterator elementIterator(String name)

Returns an iterator over the elements contained in this element which match the given local name and any namespace.

List elements()

Returns the elements contained in this element.

List elements(String name)

Returns the elements contained in this element with the given local name and any namespace.
  • Branch

Branch

常用方法
釋義

Element addElement(String name)

Adds a new Element node with the given name to this branch and returns a reference to the new node.

boolean remove(Node node)

Removes the given Node if the node is an immediate child of this branch.

Dom4j查詢

  • 列印所有屬性資訊:
/**
 * @author jifang
 * @since 16/1/18下午4:02.
 */
public class Dom4jRead {

    private Document document;

    @Before
    public void setUp() throws DocumentException {
        document = new SAXReader()
                .read(ClassLoader.getSystemResource("config.xml"));
    }

    @Test
    @SuppressWarnings("unchecked")
    public void client() {
        Element beans = document.getRootElement();

        for (Iterator iterator = beans.elementIterator(); iterator.hasNext(); ) {
            Element bean = (Element) iterator.next();
            String id = bean.attributeValue("id");
            String clazz = bean.attributeValue("class");
            System.out.println("id: " + id + ", class: " + clazz);

            scanProperties(bean.elements());
        }
    }

    public void scanProperties(List<? extends Element> properties) {
        for (Element property : properties) {
            System.out.print("name: " + property.attributeValue("name"));
            Attribute value = property.attribute("value");
            if (value != null) {
                System.out.println("," + value.getName() + ": " + value.getValue());
            }
            Attribute ref = property.attribute("ref");
            if (ref != null) {
                System.out.println("," + ref.getName() + ": " + ref.getValue());
            }
        }
    }
}
           

Dom4j添加節點

在第一個

<bean/>

标簽末尾添加

<property/>

标簽

<bean id="id1" class="com.fq.benz"> 
    <property name="name" value="benz"/>  
    <property name="refBean" ref="id2">新添加的标簽</property>
</bean>  
           
/**
 * @author jifang
 * @since 16/1/19上午9:50.
 */
public class Dom4jAppend {

    //...

    @Test
    public void client() {
        Element beans = document.getRootElement();
        Element firstBean = beans.element("bean");
        Element property = firstBean.addElement("property");
        property.addAttribute("name", "refBean");
        property.addAttribute("ref", "id2");
        property.setText("新添加的标簽");
    }

    @After
    public void tearDown() throws IOException {
        // 回寫XML
        OutputFormat format = OutputFormat.createPrettyPrint();
        XMLWriter writer = new XMLWriter(new FileOutputStream("src/main/resources/config.xml"), format);
        writer.write(document);
    }
}
           
我們可以将擷取讀寫XML操作封裝成一個工具, 以後調用時會友善些:
/**
 * @author jifang
 * @since 16/1/19下午2:12.
 */
public class XmlUtils {

    public static Document getXmlDocument(String config) {
        try {
            return new SAXReader().read(ClassLoader.getSystemResource(config));
        } catch (DocumentException e) {
            throw new RuntimeException(e);
        }
    }

    public static void writeXmlDocument(String path, Document document) {
        try {
            new XMLWriter(new FileOutputStream(path), OutputFormat.createPrettyPrint()).write(document);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }
}
           
  • 在第一個

    <bean/>

    的第一個

    <property/>

    後面添加一個

    <property/>

    标簽
<bean id="id1" class="com.fq.benz"> 
    <property name="name" value="benz"/>  
    <property name="rate" value="3.14"/>
    <property name="refBean" ref="id2">新添加的标簽</property> 
</bean>  
           
public class Dom4jAppend {

    private Document document;

    @Before
    public void setUp() {
        document = XmlUtils.getXmlDocument("config.xml");
    }

    @Test
    @SuppressWarnings("unchecked")
    public void client() {
        Element beans = document.getRootElement();
        Element firstBean = beans.element("bean");
        List<Element> properties = firstBean.elements();

        //Element property = DocumentHelper
        // .createElement(QName.get("property", firstBean.getNamespaceURI()));
        Element property = DocumentFactory.getInstance()
                .createElement("property", firstBean.getNamespaceURI());
        property.addAttribute("name", "rate");
        property.addAttribute("value", "3.14");
        properties.add(, property);
    }

    @After
    public void tearDown() {
        XmlUtils.writeXmlDocument("src/main/resources/config.xml", document);
    }
}
           

Dom4j修改節點

  • id1

    bean

    的第一個

    <property/>

    修改如下:
<property name="name" value="翡青"/>  
           
@Test
@SuppressWarnings("unchecked")
public void client() {
    Element beans = document.getRootElement();
    Element firstBean = beans.element("bean");
    List<Element> properties = firstBean.elements();

    Element property = DocumentFactory.getInstance()
            .createElement("property", firstBean.getNamespaceURI());
    property.addAttribute("name", "rate");
    property.addAttribute("value", "3.14");
    properties.add(, property);
}
           

Dom4j 删除節點

  • 删除剛剛修改的節點
@Test
@SuppressWarnings("unchecked")
public void delete() {
    List<Element> beans = document.getRootElement().elements("bean");
    for (Element bean : beans) {
        if (bean.attributeValue("id").equals("id1")) {
            List<Element> properties = bean.elements("property");
            for (Element property : properties) {
                if (property.attributeValue("name").equals("name")) {
                    // 執行删除動作
                    property.getParent().remove(property);
                    break;
                }
            }
            break;
        }
    }
}
           

Dom4j執行個體

在Java 反射一文中我們實作了根據

JSON

配置檔案來加載

bean

的對象池,現在我們可以為其添加根據XML配置(XML檔案同前):

/**
 * @author jifang
 * @since 16/1/18下午9:18.
 */
public class XmlParse {

    private static final ObjectPool POOL = ObjectPoolBuilder.init(null);

    public static Element parseBeans(String config) {
        try {
            return new SAXReader().read(ClassLoader.getSystemResource(config)).getRootElement();
        } catch (DocumentException e) {
            throw new RuntimeException(e);
        }
    }

    public static void processObject(Element bean, List<? extends Element> properties)
            throws ClassNotFoundException, IllegalAccessException, InstantiationException, NoSuchFieldException {
        Class<?> clazz = Class.forName(bean.attributeValue(CommonConstant.CLASS));
        Object targetObject = clazz.newInstance();

        for (Element property : properties) {
            String fieldName = property.attributeValue(CommonConstant.NAME);
            Field field = clazz.getDeclaredField(fieldName);
            field.setAccessible(true);
            // 含有value屬性
            if (property.attributeValue(CommonConstant.VALUE) != null) {
                SimpleValueSetUtils.setSimpleValue(field, targetObject, property.attributeValue(CommonConstant.VALUE));
            } else if (property.attributeValue(CommonConstant.REF) != null) {
                String refId = property.attributeValue(CommonConstant.REF);
                Object object = POOL.getObject(refId);
                field.set(targetObject, object);
            } else {
                throw new RuntimeException("neither value nor ref");
            }
        }

        POOL.putObject(bean.attributeValue(CommonConstant.ID), targetObject);
    }
}
           
注: 上面代碼隻是對象池項目的XML解析部分,完整項目可參考[email protected]:feiqing/commons-frame.git

XPath

XPath是一門在XML文檔中查找資訊的語言,XPath可用來在XML文檔中對元素和屬性進行周遊.

表達式 描述

/

從根節點開始擷取(

/beans

:比對根下的

<beans/>

;

/beans/bean

:比對

<beans/>

下面的

<bean/>

)

//

從目前文檔中搜尋,而不用考慮它們的位置(

//property

: 比對目前文檔中所有

<property/>

)

*

比對任何元素節點(

/*

: 比對所有标簽)

@

比對屬性(例:

//@name

: 比對所有

name

屬性)

[position]

位置謂語比對(例:

//property[1]

: 比對第一個

<property/>

;

//property[last()]

: 比對最後一個

<property/>

)

[@attr]

屬性謂語比對(例:

//bean[@id]

: 比對所有帶id屬性的标簽;

//bean[@id='id1']

: 比對所有id屬性值為’id1’的标簽)
謂語: 謂語用來查找某個特定的節點或者包含某個指定的值的節點.

XPath的文法詳細内容可以參考W3School XPath 教程.

Dom4j對XPath的支援

預設的情況下Dom4j并不支援XPath, 需要在pom下添加如下依賴:

<dependency>
    <groupId>jaxen</groupId>
    <artifactId>jaxen</artifactId>
    <version>1.1.6</version>
</dependency>
           

Dom4j

Node

接口提供了方法對XPath支援:

方法

List selectNodes(String xpathExpression)

List selectNodes(String xpathExpression, String comparisonXPathExpression)

List selectNodes(String xpathExpression, String comparisonXPathExpression, boolean removeDuplicates)

Object selectObject(String xpathExpression)

Node selectSingleNode(String xpathExpression)

XPath實作查詢

  • 查詢所有

    bean

    标簽上的屬性值
/**
 * @author jifang
 * @since 16/1/20上午9:28.
 */
public class XPathRead {

    private Document document;

    @Before
    public void setUp() throws DocumentException {
        document = XmlUtils.getXmlDocument("config.xml");
    }

    @Test
    @SuppressWarnings("unchecked")
    public void client() {
        List<Element> beans = document.selectNodes("//bean");
        for (Element bean : beans) {
            System.out.println("id: " + bean.attributeValue("id") +
                    ", class: " + bean.attributeValue("class"));
        }
    }
}
           

XPath實作更新

  • 删除id=”id2”的

    <bean/>

@Test
public void client() {
    Node bean = document.selectSingleNode("//bean[@id=\"id2\"]");
    bean.getParent().remove(bean);
}
           
參考:
Dom4j的使用
Java 處理 XML 的三種主流技術及介紹