3
\$\begingroup\$

How are you doing? I have to make a script to parse an xml input file to a json file. I tried to do my best, but it will be nice if you could check it and help me to improve it. The idea is that I don't have use objectify of libraries that converts files directly. I have to write this scritps with at least this properties: -

  1. Seat/Element type (Seat, Kitchen, Bathroom, etc) -

  2. List item

  3. Seat id (17A, 18A)

  4. Seat price

  5. Cabin class

  6. Availability

By the way I couldn't find the seat/element type for each seat.

import json import xml.dom.minidom from collections import OrderedDict xmlFile = xml.dom.minidom.parse("seatmap1.xml") def set_amount(element_to_analyze, element_to_change): if element_to_analyze.getAttribute('AvailableInd') == 'true': element_to_change['seat_price'] = seat.getElementsByTagName('ns:Service')[0].getElementsByTagName( 'ns:Fee')[0].getAttribute('Amount') def str_to_bool(s): if s == 'true': return True else: return False flight_data = OrderedDict() if xmlFile.getElementsByTagName('Document').length == 0: plane_data = xmlFile.getElementsByTagName('ns:FlightSegmentInfo')[0] flight_data['FlightNumber'] = plane_data.getAttribute('FlightNumber') flight_data['DepartureDateTime'] = plane_data.getAttribute('DepartureDateTime') flight_data['DepartureAirport'] = plane_data.getElementsByTagName('ns:DepartureAirport')[0].getAttribute( 'LocationCode') flight_data['ArrivalAirport'] = plane_data.getElementsByTagName('ns:ArrivalAirport')[0].getAttribute('LocationCode') plane = xmlFile.getElementsByTagName('ns:CabinClass') cabin_object = OrderedDict() # NS CABIN CLASS for cabin_class in plane: cabin = cabin_class.getElementsByTagName('ns:RowInfo') cabin_type = cabin[0].getAttribute('CabinType') for row_group in cabin: row_object = OrderedDict() # NS ROW INFO seat_group = row_group.getElementsByTagName('ns:SeatInfo') for seat in seat_group: seat_details = OrderedDict() details = seat.getElementsByTagName('ns:Summary')[0] seat_details['seat'] = seat.getElementsByTagName('ns:') seat_details['seat_id'] = details.getAttribute('SeatNumber') seat_details['cabin_class'] = cabin_type seat_details['availability'] = str_to_bool(details.getAttribute('AvailableInd')) set_amount(details, seat_details) row_object[details.getAttribute('SeatNumber')[-1]] = seat_details cabin_object[row_group.getAttribute('RowNumber')] = row_object flight_data['Rows'] = cabin_object with open('_parsed.json', 'w') as outfile: outfile.write(json.dumps(flight_data)) 

This is my xml file

<?xml version="1.0" encoding="UTF-8"?> <soapenv:Envelope xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <soapenv:Body> <ns:OTA_AirSeatMapRS Version="1" xmlns:ns="http://www.opentravel.org/OTA/2003/05/common/"> <ns:Success/> <ns:SeatMapResponses> <ns:SeatMapResponse> <ns:FlightSegmentInfo DepartureDateTime="2020-11-22T15:30:00" FlightNumber="1179"> <ns:DepartureAirport LocationCode="LAS"/> <ns:ArrivalAirport LocationCode="IAH"/> <ns:Equipment AirEquipType="739"/> </ns:FlightSegmentInfo> <ns:SeatMapDetails> <ns:CabinClass Layout="AB EF" UpperDeckInd="false"> <ns:RowInfo CabinType="First" OperableInd="true" RowNumber="1"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1A"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1B"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="4" ExitRowInd="false" GalleyInd="false" GridNumber="4" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1E"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="First" OperableInd="true" RowNumber="2"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2A"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2B"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="4" ExitRowInd="false" GalleyInd="false" GridNumber="4" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2E"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> </ns:CabinClass> <ns:CabinClass Layout="ABC DEF" UpperDeckInd="false"> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="7"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7A"/> <ns:Features extension="Lavatory">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7B"/> <ns:Features extension="Lavatory">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7C"/> <ns:Features extension="Lavatory">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7D"/> <ns:Features>BlockedSeat_Permanent</ns:Features> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7E"/> <ns:Features>BlockedSeat_Permanent</ns:Features> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="8"> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8A"/> <ns:Status>Held</ns:Status> <ns:Features extension="Limited Recline">Other_</ns:Features> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8B"/> <ns:Status>Held</ns:Status> <ns:Features extension="Limited Recline">Other_</ns:Features> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8C"/> <ns:Status>Held</ns:Status> <ns:Features extension="Limited Recline">Other_</ns:Features> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8D"/> <ns:Status>Held</ns:Status> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="8E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="8F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="9"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="9A"/> <ns:Status>Held</ns:Status> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9B"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9C"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9D"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="10"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10A"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10B"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10C"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10D"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="11"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11A"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11B"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11C"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11D"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="12"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12A"/> <ns:Status>Held</ns:Status> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12B"/> <ns:Status>Held</ns:Status> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="12C"/> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Aisle</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Preferred"> <ns:Fee Amount="4200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="12D"/> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Aisle</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Preferred"> <ns:Fee Amount="4200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12E"/> <ns:Status>Held</ns:Status> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12F"/> <ns:Status>Held</ns:Status> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="38"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38A"/> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1300" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38B"/> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38C"/> <ns:Features>Aisle</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1800" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38D"/> <ns:Features>Aisle</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1800" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38E"/> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38F"/> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1300" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="39"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="39A"/> <ns:Status>Held</ns:Status> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="39B"/> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39C"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39D"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> </ns:CabinClass> </ns:SeatMapDetails> </ns:SeatMapResponse> </ns:SeatMapResponses> <ns:Warnings> <ns:Warning Type="11" Code="59">ENSURE PASSENGER MEETS GOVERNMENT DESIGNATED EXIT ROW CRITERIA</ns:Warning> <ns:Warning Type="11" Code="450">Valid Credit Card Payment Types: ,VI,UP,MPVI,MC,AX,DS,DC,TP,JC</ns:Warning> </ns:Warnings> </ns:OTA_AirSeatMapRS> </soapenv:Body> </soapenv:Envelope> 
\$\endgroup\$
8
  • 1
    \$\begingroup\$Can you show an excerpt of the XML you're parsing?\$\endgroup\$CommentedMay 13, 2021 at 3:16
  • \$\begingroup\$Its a bit long, I could try\$\endgroup\$CommentedMay 13, 2021 at 3:31
  • \$\begingroup\$Why are you writing to a JSON file? What's going to consume it?\$\endgroup\$CommentedMay 13, 2021 at 16:40
  • \$\begingroup\$And why is this in an XML file on disk? It looks like a SOAP response. Can't it just be parsed in memory?\$\endgroup\$CommentedMay 13, 2021 at 16:43
  • \$\begingroup\$It is a test I should pass for a job. The idea is they give me this XML and I have to parsed it to a JSON without libraries such objetify or xmltodict or something like that. Also I have to modify the file so the output is formatted as I said\$\endgroup\$CommentedMay 13, 2021 at 17:31

1 Answer 1

2
\$\begingroup\$

There's a lot going on here, and the problem is very ill-specified. I understand that you've been asked to do this (where's the original problem description?) for a job application, so maybe they've left a bunch open to interpretation, but anyway:

Usual claims floating around on the internet are that etree is a more Pythonic XML parsing interface when compared to minidom. See https://stackoverflow.com/a/8022507/313768 for example. It's very unclear whether you're going to have performance constraints pushing you toward lxml. I find the etree interface to be more natural so it's what I've shown in my example code, but minidom is "also fine". It's not ideal that you're frequently asking for all matching tags only to pay attention to the first. I have shown a fairly strict xpath navigation scheme that does not force the parser to search the entire tree, and asks for only one element when that's called for.

Your set_amount is a somewhat strange block of code to extract into a function. It has no return values, and mutates element_to_change in place. Functions are overall a better fit when they return values and do not mutate their members. Python's approach to this is entirely lax, but if you ever switch to a functional language this becomes more of a factor.

You define str_to_bool but then fail to use it in set_amount. It's such a simple operation that it's probably not worth capturing in a function, and can be done inline with an == 'true' predicate and no if-statements.

Your use of OrderedDict is not strictly necessary for any modern version of Python.

Your Document check is inside-out and backwards - rather than checking for the presence of a totally unrelated element, you should be checking for the absence of an element that you rely on to generate the currently-attempted document type. This can be represented, for example, as an exception thrown from a constructor as I have it. Fancier patterns could use a factory that probes the document on parse and spins up the correct loading class but your question has insufficient context to justify this.

You've conflated two operations in one: loading from XML into a well-defined in-memory representation, and serialization to JSON-compatible dictionaries. I have shown how these can be separated.

Do not call outfile.write(dumps; simply call dump which accepts a file-like.

Your

 seat_details['seat'] = seat.getElementsByTagName('ns:') 

is mysterious and doesn't seem to ever produce anything. Maybe it can just be deleted?

This:

 row_object[details.getAttribute('SeatNumber')[-1]] = seat_details 

is more fragile than it needs to be. You've already been given a row name. Assuming that the row name always precedes the column name in the ID, you should not simply be taking the last character for the column - instead, take a substring from the beginning whose length is the row you already have, and validate that to be your row ID; assign the rest to be your column ID. This will support multi-character columns.

Example Code

This generates output equivalent to yours.

import json from datetime import datetime from decimal import Decimal from functools import partial from typing import Iterable, Tuple, Optional, Dict, Any from xml.etree import ElementTree from xml.etree.ElementTree import Element NAMESPACES = { 'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/', 'ns': 'http://www.opentravel.org/OTA/2003/05/common/', } ns_find = partial(Element.find, namespaces=NAMESPACES) ns_findall = partial(Element.findall, namespaces=NAMESPACES) class Seat: __slots__ = ( 'available', 'cabin_type', 'seat_id', 'row', 'col', 'seat_price', ) def __init__(self, seat: Element, cabin_type: str, row: str): summary = ns_find(seat, './ns:Summary') self.available = summary.attrib['AvailableInd'] == 'true' self.cabin_type = cabin_type seat_id = summary.attrib['SeatNumber'] row_from_id = seat_id[:len(row)] if row != row_from_id: raise ValueError(f'Row {row} conflicts with seat ID {seat_id}') self.seat_id = seat_id self.row = row self.col = seat_id[len(row):] if self.available: self.seat_price: Optional[Decimal] = Decimal( ns_find(seat, './ns:Service/ns:Fee').attrib['Amount'] ) else: self.seat_price = None def __str__(self): return self.seat_id def as_dict(self) -> Dict[str, Any]: d = { 'seat_id': self.seat_id, 'cabin_class': self.cabin_type, 'availability': self.available, } if self.seat_price is not None: d['seat_price'] = str(self.seat_price) return d @classmethod def get_row(cls, row: Element, cabin_type: str, row_no: str) -> Iterable[Tuple[str, 'Seat']]: for seat_elm in ns_findall(row, './ns:SeatInfo'): seat = cls(seat_elm, cabin_type, row_no) yield seat.col, seat class AirSeatMap: __slots__ = ('flight', 'seat_map') def __init__(self, filename: str): root = ElementTree.parse(filename).getroot() response = ns_find( root, './soapenv:Body/ns:OTA_AirSeatMapRS' '/ns:SeatMapResponses/ns:SeatMapResponse' ) if response is None: raise ValueError('This is probably not an AirSeatMap') self.flight = ns_find(response, './ns:FlightSegmentInfo') self.seat_map = ns_find(response, './ns:SeatMapDetails') @property def flight_number(self) -> str: return self.flight.attrib['FlightNumber'] @property def departure_time(self) -> datetime: return datetime.fromisoformat(self.flight.attrib['DepartureDateTime']) @property def departure_airport(self) -> str: return ns_find(self.flight, './ns:DepartureAirport').attrib['LocationCode'] @property def arrival_airport(self) -> str: return ns_find(self.flight, './ns:ArrivalAirport').attrib['LocationCode'] @property def seats(self) -> Iterable[Tuple[str, Iterable[Tuple[str, Seat]] ]]: for cabin_class in ns_findall(self.seat_map, './ns:CabinClass'): for row in ns_findall(cabin_class, './ns:RowInfo'): cabin_type = row.attrib['CabinType'] row_no = row.attrib['RowNumber'] yield row_no, Seat.get_row(row, cabin_type, row_no) def as_dict(self) -> Dict[str, Any]: return { 'FlightNumber': self.flight_number, 'DepartureDateTime': self.departure_time.isoformat(), 'DepartureAirport': self.departure_airport, 'ArrivalAirport': self.arrival_airport, 'Rows': { row_no: { col_no: seat.as_dict() for col_no, seat in row } for row_no, row in self.seats }, } def main(): map = AirSeatMap("seatmap1.xml") with open('_parsed.json', 'w') as outfile: json.dump(map.as_dict(), outfile) if __name__ == '__main__': main() 
\$\endgroup\$
6
  • 1
    \$\begingroup\$This is so awesome Reinderien, thank you so much. Your solution is so clean. My first problem is that the two xml files have different structure and different attributes and elements. That's why my first idea was oriented to do 2 scripts in one, dividing them with that if. I know it is an awful solution. I mean they don't share any property, how do I do 1 script to parse two XML file with different structure and different properties? This is my second XML file : GitHubRepo\$\endgroup\$CommentedMay 14, 2021 at 5:26
  • \$\begingroup\$These are the repositories with the instructions and both XML files so you check what I'm saying: InstructionsXML 1XML 2\$\endgroup\$CommentedMay 14, 2021 at 5:40
  • \$\begingroup\$You are my only hope\$\endgroup\$CommentedMay 14, 2021 at 17:48
  • \$\begingroup\$Haha, well I appreciate that, but if you want more feedback about a general solution that encompasses both file formats, you're going to need to write a new question that includes - copied verbatim - the problem statement, all of your own code, and example sections from both XML formats.\$\endgroup\$CommentedMay 14, 2021 at 17:50
  • \$\begingroup\$meta.stackexchange.com/questions/364452/python-xml-json-parsing\$\endgroup\$CommentedMay 14, 2021 at 21:56

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.