4
\$\begingroup\$

I have recently been working on a program which takes as input, log files generated from a running/biking app. These files are in an xml format which is incredibly regular. In my program, I first strip all of the geographic and time information, and output as a text file. Then, reading back this text file, instances of Entry class are compiled into a list. Using these entries, calculations (and eventually graphing and long term trends) can be achieved.

This python script requires the library geopy to use, and has been tested on Python3.

My main concern is how little of the code is pythonic in nature, with multiple if statements begging the question of optimization. I have uploaded a copy of an example input file for the script on my github, which can be found here. Typing '4-19-17a.bin' without the quotations at the prompt will begin parsing, and will print to standard output the string representation of each Entry instance that was created, along with a couple calculations.


dataparse.py

import xml.etree.ElementTree as ET def processtimestring(time): year = int(time[:4]) month = int(time[5:7]) day = int(time[8:10]) h = int(time[11:13]) m = int(time[14:16]) s = float(time[17:-6]) return [year, month, day, h, m, s] def parsethedata(): time = [] lat = [] lng = [] alt = [] dist = [] garbage = '{http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2}' filestring = input('The file to parse: ') outputstring = filestring[:-4] + '.txt' f = open(outputstring, 'w') tree = ET.parse(filestring) root = tree.getroot() for child in root[0][0][1][4]: for info in child: if (not info.text): for position in info: f.write(position.tag.replace(garbage, '') + '\n') f.write(position.text + '\n') else: f.write(info.tag.replace(garbage, '') + '\n') f.write(info.text + '\n') f.close() with open(outputstring) as f: for i, line in enumerate(f): if ((i+9) % 10 == 0): time.append(processtimestring(line[:-1])) elif ((i+7) % 10 == 0): lat.append(float(line[:-1])) elif ((i+5) % 10 == 0): lng.append(float(line[:-1])) elif ((i+3) % 10 == 0): alt.append(float(line[:-1])) elif ((i+1) % 10 == 0): dist.append(float(line[:-1])) return [time, lat, lng, alt, dist] 

entry.py

import geopy.distance import datetime as dt class Entry(object): @staticmethod def distancecalc(x1, x2): coord_1 = (x1.pos[0], x1.pos[1]) coord_2 = (x2.pos[0], x2.pos[1]) return geopy.distance.vincenty(coord_1, coord_2).m @staticmethod def timecalc(x1, x2): a, b = x1.time, x2.time t1 = dt.datetime(a[0], a[1], a[2], a[3], a[4], int(a[5])) t2 = dt.datetime(b[0], b[1], b[2], b[3], b[4], int(b[5])) t1_decimal = float(a[5]-int(a[5])) t2_decimal = float(b[5]-int(b[5])) return (t2 - t1).total_seconds() + (t2_decimal - t1_decimal) def __init__(self, time, lat, lng, alt, dist): self.time = time self.pos = [lat, lng] self.alt = alt self.dist = dist def __str__(self): ymd = ('ymd: ' + str(self.time[0]) + ', ' + str(self.time[1]) + ', ' + str(self.time[2])) hms = (' hms: ' + str(self.time[3]) + ', ' + str(self.time[4]) + ', ' + str(self.time[5])) lat = ' lat: ' + str(self.pos[0]) lng = ' lng: ' + str(self.pos[1]) alt = ' alt: ' + str(self.alt) dst = ' dst: ' + str(self.dist) stringrepresentation = ymd + hms + lat + lng + alt + dst return stringrepresentation 

main.py

import dataparse as dp from entry import Entry exitflag = False while not exitflag: [time, lat, lng, alt, dist] = dp.parsethedata() entries = [Entry(x, lat[i], lng[i], alt[i], dist[i]) for i, x in enumerate(time)] prev = entries[0] total = 0.0 for entry in entries: dx = Entry.distancecalc(prev, entry) dt = Entry.timecalc(prev, entry) total += dx print('Total: ' + str(total) + ', Logged Dist: ' + str(entry.dist) + ', dx: ' + str(dx) + ', dt: ' + str(dt), end="") if dt: print(', Speed: ' + str(dx/dt)) else: print(', Speed: 0.0') prev = entry userinput = input('Parse another file? y/n: ') if userinput == 'n': exitflag = True 
\$\endgroup\$
0

    1 Answer 1

    2
    \$\begingroup\$

    I have recast your code, and the complete listing is below. I will discuss various elements of the changes here. If I missed explaining a change, or some concept, please leave a comment and I will see what I can do to better cover that.

    Organization

    First thing I did was move the parsing code to be a method of your Entry class. This parsing code seems to be very specific to this class, so let's make the relationship explicit. I also broke the parsing into parsing a single Entry Node, and the part which manages parsing the entire XML file.

    Parsing

    File Parse Code:

    @classmethod def from_xml(cls, xml_source): tree = etree.parse(xml_source) root = tree.getroot() return [cls.from_xml_node(child) for child in root[0][0][1][4]] 

    Some notes:

    1. Don't prompt for user input in the parsing code.
    2. Convert the data directly to the desired representation (a list of Entries).

    Node Parse Code:

    converters = dict( AltitudeMeters=(float, 'alt'), DistanceMeters=(float, 'dist'), LatitudeDegrees=(float, 'lat'), LongitudeDegrees=(float, 'lng'), Time=(dateutil.parser.parse, 'time'), ) @classmethod def from_xml_node(cls, node): data_point = {} for info in node.getiterator(): tag = info.tag.split('}')[-1] if tag in cls.converters: converter, tag_text = cls.converters[tag] data_point[tag_text] = converter(info.text.strip()) return cls(**data_point) 

    A few highlights:

    1. Flattened nodes using getiterator

      getiterator finds all of the nodes under a node, so allows the removal of the special if for the nested position data.

    2. Removed need for stacked if/else by using convertersdict

      By using the node tag as a key into a dict, you can store the pieces needed to parse that tag's value. In this case I stored a conversion function, and the desired name for the value.

    3. Removed garbage string by more simply splitting on } and using text after the }

    4. Convert the data directly to the desired representation.

      The writing of the data into a secondary file during the XML tree traversal, added quite a bit of complexity, for no apparent gain. Instead this code stores each element of the current node into a dict. The dict is then used to init an Entry object.

    5. Dict keys are mapped to same names used in Entry.__init__

      The parameter names for __init__ match the keys used in the data_pointdict. This mapping was done via the second element in the tuple in the converters dict. This makes it easy to map things cleanly. This allows return cls(**data_point) to directly create a class instance. Note the fact that you can specify non-keyword parameters with keywords, which is how the **data_point can expand into the 5 parameters to init the Entry class

    Class Structure

    static methods probably should not be passed a class instance

    If you are passing a class instance to a static method of that class, you are probably doing it wrong. Static methods are intended for code which is closely aligned with a class but which it not directly associated with an instance.

    So I converted:

    @staticmethod def distancecalc(x1, x2): 

    To:

    def distance_from(self, other): 

    properties allow calculated fields to be easily treated

    In distance_from I converted:

    coord_1 = (x1.pos[0], x1.pos[1]) coord_2 = (x2.pos[0], x2.pos[1]) return geopy.distance.vincenty(coord_1, coord_2).m 

    To:

    return geopy.distance.vincenty(self.pos, other.pos).m 

    By providing:

    @property def pos(self): return self.lat, self.lng 

    Since lat, lng as a pair is a position, making that relationship explicit by using a property, better documents the relationship.

    Python features

    Using python libraries

    By storing the timestamp in the class as a datetime value, I was able to convert:

    @staticmethod def timecalc(x1, x2): a, b = x1.time, x2.time t1 = dt.datetime(a[0], a[1], a[2], a[3], a[4], int(a[5])) t2 = dt.datetime(b[0], b[1], b[2], b[3], b[4], int(b[5])) t1_decimal = float(a[5]-int(a[5])) t2_decimal = float(b[5]-int(b[5])) return (t2 - t1).total_seconds() + (t2_decimal - t1_decimal) 

    To:

    def time_since(self, other): return (self.time - other.time).total_seconds() 

    This removed all of the date conversion and other math.

    Use string formatting

    By using python string formatting functionality I was able to reduce the __str__ method from 11 lines to 2

    def __str__(self): return 'time: %s, lat: %.6f, lng: %.6f, alt: %.2f, dst:%.3f' % ( self.time, self.lat, self.lng, self.alt, self.dist) 

    zip allows processing multiple lists at once

    Since I have completed removed the lists of 'lat', 'lng' etc, this is less relevant, but I thought I would show zip this way:

    for prev, entry in zip(entries[:-1], entries[1:]): 

    this line recasts your print loop from:

    prev = entries[0] for entry in entries: .... prev = entry 

    This has the side benefit of not printing the first line of all zeros, but is mostly here to show how to use zip. zip grabs the first element of each of the lists passed to it, and presents them on the first iteration. On the second iteration it presents the second element of each list, etc. In this case I used all but the last element of the entries combined with all but the first element as two lists to get adjacent pairs for each iteration.

    Recast Code

    import geopy.distance import dateutil.parser import xml.etree.ElementTree as etree class Entry(object): def __init__(self, time, lat, lng, alt, dist): self.time = time self.lat = lat self.lng = lng self.alt = alt self.dist = dist def __str__(self): return 'time: %s, lat: %.6f, lng: %.6f, alt: %.2f, dst:%.3f' % ( self.time, self.lat, self.lng, self.alt, self.dist) def distance_from(self, other): return geopy.distance.vincenty(self.pos, other.pos).m def time_since(self, other): return (self.time - other.time).total_seconds() @property def pos(self): return self.lat, self.lng converters = dict( AltitudeMeters=(float, 'alt'), DistanceMeters=(float, 'dist'), LatitudeDegrees=(float, 'lat'), LongitudeDegrees=(float, 'lng'), Time=(dateutil.parser.parse, 'time'), ) @classmethod def from_xml_node(cls, node): data_point = {} for info in node.getiterator(): tag = info.tag.split('}')[-1] if tag in cls.converters: converter, tag_text = cls.converters[tag] data_point[tag_text] = converter(info.text.strip()) return cls(**data_point) @classmethod def from_xml(cls, xml_source): tree = etree.parse(xml_source) root = tree.getroot() return [cls.from_xml_node(child) for child in root[0][0][1][4]] while True: filename = input('The file to parse: ') entries = Entry.from_xml(filename) total = 0.0 for prev, entry in zip(entries[:-1], entries[1:]): dx = entry.distance_from(prev) dt = entry.time_since(prev) total += dx print('Total: %.3f, Logged Dist: %.3f, dx: %.3f, dt: %.3f, ' 'Speed: %.1f' % (total, entry.dist, dx, dt, dx/dt if dt else 0)) if 'n' == input('Parse another file? y/n: '): break 
    \$\endgroup\$
    4
    • \$\begingroup\$When you have a moment, can you explain the syntax cls(**data_point) ? I understand that cls and self are basically equivalent with cls marking a class function vs an instance function, while ** means keyword argument packing, but I'm not sure what that combination does exactly. Instantiate a new instance of the class with the data_point packed as a Dictionary?\$\endgroup\$CommentedApr 24, 2017 at 17:41
    • 1
      \$\begingroup\$@logical123, You are exactly correct. The thing to note is that the parameter names for __init__ match the keys used in the data_point dict. This mapping was done via the second element in the tuple in the converters dict. This makes it easy to map things cleanly. You maybe unfamiliar with the fact that you can specify non-keyword parameters with keywords, which is how the **dict can expand into the 5 parameters to init the Entry class\$\endgroup\$CommentedApr 24, 2017 at 17:50
    • 1
      \$\begingroup\$I noticed a small bug, second to last line should be ==, not !=, I believe. Unfortunately, that is too small an edit for the system to allow me to make. lol\$\endgroup\$CommentedApr 24, 2017 at 18:41
    • \$\begingroup\$@logical123, Fixed.\$\endgroup\$CommentedApr 24, 2017 at 18:55

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.