Python Programming/XML Tools
Appearance
Introduction
[edit | edit source]Python includes several modules for manipulating xml.
xml.sax.handler
[edit | edit source]importxml.sax.handlerassaxhandlerimportxml.saxassaxparserclassMyReport:def__init__(self):self.Y=1classMyCH(saxhandler.ContentHandler):def__init__(self,report):self.X=1self.report=reportdefstartDocument(self):print('startDocument')defstartElement(self,name,attrs):print('Element:',name)report=MyReport()#for future usech=MyCH(report)xml="""\<collection> <comic title=\"Sandman\" number='62'> <writer>Neil Gaiman</writer> <penciller pages='1-9,18-24'>Glyn Dillon</penciller> <penciller pages="10-17">Charles Vess</penciller> </comic></collection>"""print(xml)saxparser.parseString(xml,ch)
xml.dom.minidom
[edit | edit source]An example of doing RSS feed parsing with DOM
fromxml.domimportminidomasdomimporturllib2deffetchPage(url):a=urllib2.urlopen(url)return''.join(a.readlines())defextract(page):a=dom.parseString(page)item=a.getElementsByTagName('item')foriinitem:ifi.hasChildNodes():t=i.getElementsByTagName('title')[0].firstChild.wholeTextl=i.getElementsByTagName('link')[0].firstChild.wholeTextd=i.getElementsByTagName('description')[0].firstChild.wholeTextprint(t,l,d)if__name__=='__main__':page=fetchPage("http://rss.slashdot.org/Slashdot/slashdot")extract(page)
XML document provided by pyxml documentation.