這一篇文章接着前一篇來接續講解如何使用Dom方式操作XML資料,這一篇文章主要介紹如何解析(parse)XML檔案,本文執行個體XML檔案是上一篇的生成的檔案,我們看看能不能完整的讀出來,這個XML檔案内容如下:
XML/HTML代碼
<?xml version="1.0" encoding="utf-8"?>
<book_store name="new hua" website="http://www.ourunix.org">
<book>
<name>Hamlet</name>
<author>William Shakespeare</author>
<price>$20</price>
<grade>good</grade>
</book>
<name>shuihu</name>
<author>naian shi</author>
<price>$200</price>
</book_store>
主要方法
1、加載讀取XML檔案
Python代碼
minidom.parse(filename)
2、擷取XML文檔對象
doc.documentElement
3、 擷取XML節點屬性值
node.getAttribute(AttributeName)
4、擷取XML節點對象集合
node.getElementsByTagName(TagName)
5、 擷取XML節點值
node.childNodes[index].nodeValue
代碼示範
同樣先用一個簡單版本來示範下如何使用Dom解析XML檔案,代碼如下:
'''''
Created on 2012-8-28
@author: walfred
@module: domxml.parseXMLSimple
@description:
'''
import xml.dom.minidom as Dom
import sys
if __name__ == "__main__":
try:
xml_file = Dom.parse("./book_store.xml")
except Exception, e:
print e
sys.exit()
node_root = xml_file.documentElement
name = node_root.getAttribute("name")
website = node_root.getAttribute("website")
print "name of book store: %s\nwebsite of book store: %s" %(name, website)
node_book_list = node_root.getElementsByTagName("book")
for book_node in node_book_list:
book_name_node = book_node.getElementsByTagName("name")[0]
book_name_value = book_name_node.childNodes[0].data
book_author_node = book_node.getElementsByTagName("author")[0]
book_author_value = book_author_node.childNodes[0].data
book_price_node = book_node.getElementsByTagName("price")[0]
book_price_value = book_price_node.childNodes[0].data
book_grade_node = book_node.getElementsByTagName("grade")[0]
book_grade_value = book_grade_node.childNodes[0].data
print "book: %s\t author: %s\t price: %s\t grade: %s\t" %(book_name_value, book_author_value, book_price_value, book_grade_value)
運作結果如下:
name of book store: new hua website of book store: http://www.ourunix.org book: Hamlet author: William Shakespeare price: $20 grade: good book: shuihu author: naian shi price: $200 grade: good
同樣接着來一個所謂的進階版本:
'''
Created on 2012-8-28
@author: walfred
@module: domxml.XMLParser
@description:
'''
class XMLParser:
def __init__(self, xml_file_path):
try:
self.xml = Dom.parse(xml_file_path)
except:
sys.exit()
self.book_list = list()
def getNodeName(self, prev_node, node_name):
return prev_node.getElementsByTagName(node_name)
def getNodeAttr(self, node, att_name):
return node.getAttribute(att_name)
def getNodeValue(self, node):
return node.childNodes[0].data.encode("utf-8")
def parse(self):
node_root = self.xml.documentElement
print "store: %s, website: %s" %(self.getNodeAttr(node_root, "name"), \
self.getNodeAttr(node_root, "website"))
node_book_list = self.getNodeName(node_root, "book")
for node_book in node_book_list:
book_info = dict()
node_book_name = self.getNodeName(node_book, "name")[0]
book_name_value = self.getNodeValue(node_book_name)
book_info["name"] = book_name_value
node_book_author = self.getNodeName(node_book, "author")[0]
book_author_value = self.getNodeValue(node_book_author)
book_info["author"] = book_author_value
node_book_price = self.getNodeName(node_book, "price")[0]
book_price_value = self.getNodeValue(node_book_price)
book_info["price"] = book_price_value
node_book_grade = self.getNodeName(node_book, "grade")[0]
book_garde_value = self.getNodeValue(node_book_grade)
book_info["grade"] = book_garde_value
self.book_list.append(book_info)
def getBookList(self):
return self.book_list
myXMLParser = XMLParser("book_store.xml")
myXMLParser.parse()
print myXMLParser.getBookList()
完