4
\$\begingroup\$

I have this XML from a SOAP call:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"> <soapenv:Header/> <soapenv:Body> <SessionID xmlns="http://www.gggg.com/oog">5555555</SessionID> <QueryResult xmlns="http://www.gggg.com/oog/Query" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Code>testsk</Code> <Records> <Record> <dim_id>1</dim_id> <resource_full_name>Administrator, Sir</resource_full_name> <resource_first_name>Sir</resource_first_name> <resource_last_name>Administrator</resource_last_name> <resource_email>[email protected]</resource_email> <resource_user_name>admin</resource_user_name> </Record> <Record> <dim_id>2</dim_id> <resource_full_name>scheduler, scheduler</resource_full_name> <resource_first_name>scheduler</resource_first_name> <resource_last_name>scheduler</resource_last_name> <resource_email>[email protected]</resource_email> <resource_user_name>scheduler</resource_user_name> </Record> 

My goal: To parse each Record's sub-elements <dim_id> ... <resource_user_name> and save each record as a row in a CSV.

My Code:

dim_id_list = [] full_name_list = [] first_name_list = [] last_name_list = [] resource_email_list = [] resource_user_name_list = [] root = et.parse('xml_stuff.xml').getroot() for dim_id in root.iter('{http://www.gggg.com/oog/Query}dim_id'): dim_id_list.append(dim_id.text) for resource_full_name in root.iter('{http://www.gggg.com/oog/Query}resource_full_name'): full_name_list.append(resource_full_name.text) for resource_first_name in root.iter('{http://www.gggg.com/oog/Query}resource_first_name'): first_name_list.append(resource_first_name.text) for resource_last_name in root.iter('{http://www.gggg.com/oog/Query}resource_last_name'): last_name_list.append(resource_last_name.text) for resource_email in root.iter('{http://www.gggg.com/oog/Query}resource_email'): resource_email_list.append(resource_email.text) for resource_user_name in root.iter('{http://www.gggg.com/oog/Query}resource_user_name'): resource_user_name_list.append(resource_user_name.text) rows = zip(dim_id_list, full_name_list, first_name_list, last_name_list, resource_email_list, resource_user_name_list) with open('test.csv', "w", encoding='utf16', newline='') as f: writer = csv.writer(f) for row in rows: writer.writerow(row) 

Is there a better way to loop through the Records? This code is terribly verbose. I tried this:

for record in root.findall('.//{http://www.gggg.com/oog/Query}Record'): dim_id = record.find('dim_id').text # Extract each attribute, save to list. etc. 

But I am getting attribute errors trying to access each record's text property.

\$\endgroup\$

    1 Answer 1

    5
    \$\begingroup\$

    It makes little sense to slice the data into "vertical" lists, then transpose them back into rows using zip(). Not only is it cumbersome to do it that way, it's also fragile. If, for example, one records is missing its resource_email child element, then all subsequent rows will be off!

    You can use writer.writerows(rows) instead of the for row in rows: writer.write(row) loop. Furthermore, you can pass a generator expression so that the CSV writer extracts records on the fly as needed.

    It's customary to import xml.etree.ElementTree as ET rather than as et.

    Suggested solution

    import csv from xml.etree import ElementTree as ET fieldnames = [ 'dim_id', 'resource_full_name', 'resource_first_name', 'resource_last_name', 'resource_email', 'resource_user_name', ] ns = {'': 'http://www.gggg.com/oog/Query'} xml_records = ET.parse('xml_stuff.xml').find('.//Records', ns) with open('test2.csv', 'w', encoding='utf16', newline='') as f: csv.DictWriter(f, fieldnames).writerows( { prop.tag.split('}', 1)[1]: prop.text for prop in xr } for xr in xml_records ) 

    If you are certain that each <Record> always has its child elements in the right order, you can simplify it further by not explicitly stating the element/field names:

    import csv from xml.etree import ElementTree as ET ns = { '': 'http://www.gggg.com/oog/Query', 'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/', } records = ET.parse('xml_stuff.xml').find('soapenv:Body/QueryResult/Records', ns) with open('test2.csv', 'w', encoding='utf16', newline='') as f: csv.writer(f).writerows( [prop.text for prop in r] for r in records ) 
    \$\endgroup\$

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.