I'm creating an application that reads in data from a CSV file and creates an XML file using LXML. The below code works as expected. However, before I develop it further I would like to refactor it and make it more DRY. I will need to start creating the additional XML snippets for the other code types in the CSV file and I believe, in its current state, there is too much duplication.
I am attempting to learn Python while using a real user case, please can someone provide any guidance on how I could make the code more eloquent and more DRY. Do I need to be using classes and functions or is my approach OK for the type of file (XML) and creating?
import pandas as pd from lxml import etree as et import uuid df = pd.read_csv('assets.csv', sep=',') root = et.Element('SignallingSchemeData', xmlns='boo') timeframe = et.SubElement(root,'TimeFrame') timeframe.attrib["fileUID"] = str(uuid.uuid4()) timeframe.attrib["name"] = str("timeframe 1") for index, row in df.iterrows(): if row['CODE'] == 'S1': equipment = et.SubElement(root, 'Equipment') signalEquipment = et.SubElement(equipment, 'SignalEquipment') signalEquipment.attrib["fileUID"] = str(row["ID"]) signalEquipment.attrib["name"] = str(row["DESC"]) equipment = et.SubElement(root, 'Equipment') equipmentSupportEquipment = et.SubElement(equipment, 'EquipmentSupportEquipment') equipmentSupportEquipment.attrib["fileUID"] = str(uuid.uuid4()) equipmentSupportReference = et.SubElement(signalEquipment, 'EquipmentSupportReference').text = equipmentSupportEquipment.attrib["fileUID"] else: equipment = et.SubElement(root, 'Equipment') source = et.SubElement(root, 'Source') view = et.SubElement(root, 'View') view.attrib["fileUID"] = str(uuid.uuid4()) view.attrib["name"] = str('Coordinates') for index, row in df.iterrows(): viewCoordinatesList = et.SubElement(view, 'ViewCoordinatesList') viewCoordinates = et.SubElement(viewCoordinatesList, 'ViewCoordinates') itemFileUID = et.SubElement(viewCoordinates, 'ItemFileUID') itemFileUID.attrib['fileUID'] = str(row['ID']) viewCoordinatePairLon = et.SubElement(viewCoordinates, 'ViewCoordinatePair', name = 'longittude') viewCoordinatePairLon.attrib['Value'] = str(row['Y']) viewCoordinatePairLat = et.SubElement(viewCoordinates, 'ViewCoordinatePair', name = 'latitude') viewCoordinatePairLat.attrib['Value'] = str(row['X']) viewCoordinatePairH = et.SubElement(viewCoordinates, 'ViewCoordinatePair', name = 'height') viewCoordinatePairH.attrib['Value'] = str(row['Z']) et.ElementTree(root).write('test.xml', pretty_print=True, xml_declaration = True, encoding='UTF-8', standalone = None)
assets.csv is as follows:
ID,CODE,ELR,TRID,DIR,MILEAGE,X,Y,Z,DESC,IMAGE 30734,S1,LEC1,1100,,008+0249 (9-1497),518169.12,185128.27,37.52,,Asset30734.jpg 31597,S10,LEC1,1100,,008+0286 (9-1460),518151.38,185157.1,36.7,IRJ at 8 miles and 0289 yards,Asset31597.jpg 31598,S10,LEC1,1100,,008+0286 (9-1460),518150.4,185156.11,36.7,IRJ at 8 miles and 0289 yards,Asset31598.jpg 31596,S10,LEC1,1100,,008+0287 (9-1458),518149.76,185157.14,36.7,IRJ at 8 miles and 0289 yards,Asset31596.jpg