天天看点

Python:Dom解析XML文件(读XML)

这一篇文章接着前一篇来接续讲解如何使用Dom方式操作XML数据,这一篇文章主要介绍如何解析(parse)XML文件,本文实例XML文件是上一篇的生成的文件,我们看看能不能完整的读出来,这个XML文件内容如下:

XML/HTML代码

<?xml version="1.0" encoding="utf-8"?>

<book_store name="new hua" website="http://www.ourunix.org">

<book>

<name>Hamlet</name>

<author>William Shakespeare</author>

<price>$20</price>

<grade>good</grade>

</book>

<name>shuihu</name>

<author>naian shi</author>

<price>$200</price>

</book_store>

主要方法

        1、加载读取XML文件

Python代码

minidom.parse(filename) 

        2、获取XML文档对象

doc.documentElement 

        3、 获取XML节点属性值

node.getAttribute(AttributeName) 

        4、获取XML节点对象集合

node.getElementsByTagName(TagName) 

        5、 获取XML节点值

node.childNodes[index].nodeValue 

代码演示

        同样先用一个简单版本来演示下如何使用Dom解析XML文件,代码如下:

'''''

Created on 2012-8-28

@author:  walfred

@module: domxml.parseXMLSimple

@description:

'''

import xml.dom.minidom as Dom 

import sys 

if __name__ == "__main__": 

try: 

        xml_file = Dom.parse("./book_store.xml") 

except Exception, e: 

print e 

        sys.exit() 

    node_root = xml_file.documentElement 

    name = node_root.getAttribute("name") 

    website = node_root.getAttribute("website") 

print "name of book store: %s\nwebsite of book store: %s" %(name, website) 

    node_book_list = node_root.getElementsByTagName("book") 

for book_node in node_book_list: 

        book_name_node = book_node.getElementsByTagName("name")[0] 

        book_name_value = book_name_node.childNodes[0].data 

        book_author_node = book_node.getElementsByTagName("author")[0] 

        book_author_value = book_author_node.childNodes[0].data 

        book_price_node = book_node.getElementsByTagName("price")[0] 

        book_price_value = book_price_node.childNodes[0].data 

        book_grade_node = book_node.getElementsByTagName("grade")[0] 

        book_grade_value = book_grade_node.childNodes[0].data 

print "book: %s\t author: %s\t price: %s\t grade: %s\t" %(book_name_value, book_author_value, book_price_value, book_grade_value) 

        运行结果如下:

name of book store: new hua website of book store: http://www.ourunix.org book: Hamlet author: William Shakespeare price: $20 grade: good book: shuihu author: naian shi price: $200 grade: good

        同样接着来一个所谓的高级版本:

'''  

Created on 2012-8-28  

@author:  walfred 

@module: domxml.XMLParser  

@description: 

'''   

class XMLParser: 

    def __init__(self, xml_file_path): 

        try: 

self.xml = Dom.parse(xml_file_path) 

        except: 

            sys.exit() 

self.book_list = list() 

    def getNodeName(self, prev_node, node_name): 

        return prev_node.getElementsByTagName(node_name) 

    def getNodeAttr(self, node, att_name): 

        return node.getAttribute(att_name) 

    def getNodeValue(self, node): 

        return node.childNodes[0].data.encode("utf-8") 

    def parse(self): 

node_root = self.xml.documentElement 

        print "store: %s, website: %s" %(self.getNodeAttr(node_root, "name"), \ 

                                     self.getNodeAttr(node_root, "website")) 

node_book_list = self.getNodeName(node_root, "book") 

        for node_book in node_book_list: 

book_info = dict() 

node_book_name = self.getNodeName(node_book, "name")[0] 

book_name_value = self.getNodeValue(node_book_name) 

            book_info["name"] = book_name_value 

node_book_author = self.getNodeName(node_book, "author")[0] 

book_author_value = self.getNodeValue(node_book_author) 

            book_info["author"] = book_author_value 

node_book_price = self.getNodeName(node_book, "price")[0] 

book_price_value = self.getNodeValue(node_book_price) 

            book_info["price"] = book_price_value 

node_book_grade = self.getNodeName(node_book, "grade")[0] 

book_garde_value = self.getNodeValue(node_book_grade) 

            book_info["grade"] = book_garde_value 

            self.book_list.append(book_info) 

    def getBookList(self): 

        return self.book_list 

myXMLParser = XMLParser("book_store.xml") 

    myXMLParser.parse() 

    print myXMLParser.getBookList() 

        完