I am very new to Python, and also this is my first time trying to parse XML.
I am interested in information within str
elements. I can identify that information using the str@name
attribute value.
def get_cg_resources(pref_label, count=10): r = request_that_has_the_xml ns = {'ns':"http://www.loc.gov/zing/srw/"} tree = ET.ElementTree(ET.fromstring(r.text)) records =[] for elem in tree.iter(tag='{http://www.loc.gov/zing/srw/}record'): record = { 'title':'', 'source': '', 'snippet': '', 'link':'', 'image':'', 'adapter':'CG' } for value in elem.iter(tag='str'): attr = value.attrib['name'] if(attr == 'dc.title'): record['title'] = value.text elif(attr == 'authority_name'): record['source'] = value.text elif(attr == 'dc.description'): record['snippet'] = value.text elif(attr == 'dc.related.link' ): record['link'] = value.text elif(attr == 'cached_thumbnail'): img_part = value.text record['image'] = "http://urlbase%s" % img_part records.append(record) return records
Is this approach correct/efficient for extracting the information I need? Should I be searching for the str
elements differently?
Any suggestions for improvements are welcome.