Jump to content

Python Programming/XML Tools

From Wikibooks, open books for an open world


Introduction

[edit | edit source]

Python includes several modules for manipulating xml.

xml.sax.handler

[edit | edit source]

Python Doc

importxml.sax.handlerassaxhandlerimportxml.saxassaxparserclassMyReport:def__init__(self):self.Y=1classMyCH(saxhandler.ContentHandler):def__init__(self,report):self.X=1self.report=reportdefstartDocument(self):print('startDocument')defstartElement(self,name,attrs):print('Element:',name)report=MyReport()#for future usech=MyCH(report)xml="""\<collection> <comic title=\"Sandman\" number='62'> <writer>Neil Gaiman</writer> <penciller pages='1-9,18-24'>Glyn Dillon</penciller> <penciller pages="10-17">Charles Vess</penciller> </comic></collection>"""print(xml)saxparser.parseString(xml,ch)

xml.dom.minidom

[edit | edit source]

An example of doing RSS feed parsing with DOM

fromxml.domimportminidomasdomimporturllib2deffetchPage(url):a=urllib2.urlopen(url)return''.join(a.readlines())defextract(page):a=dom.parseString(page)item=a.getElementsByTagName('item')foriinitem:ifi.hasChildNodes():t=i.getElementsByTagName('title')[0].firstChild.wholeTextl=i.getElementsByTagName('link')[0].firstChild.wholeTextd=i.getElementsByTagName('description')[0].firstChild.wholeTextprint(t,l,d)if__name__=='__main__':page=fetchPage("http://rss.slashdot.org/Slashdot/slashdot")extract(page)

XML document provided by pyxml documentation.

close