Parsing a CSV file with Python

Question

My goal is to parse this CSV data:

Time,Tank,Product,Volume,TC Vol,Ullage,90 Ul,Height,Water,Temp 2017-10-19T18:52:41.118408,1,UNLEADED,4406,4393,7221,6058.3,37.49,0,64.15 2017-10-19T18:52:41.118408,3,SUPER,8317,8278,3310,2147.3,61.4,0,66.74 2017-10-19T18:52:41.118408,4,ADSL2,6807,6774,4820,3657.3,51.98,0,70.46 2017-10-19T18:53:13.894066,1,UNLEADED,4406,4393,7221,6058.3,37.49,0,64.15 2017-10-19T18:53:13.894066,3,SUPER,8313,8273,3314,2151.3,61.37,0,66.74 2017-10-19T18:53:13.894066,4,ADSL2,6805,6772,4822,3659.3,51.97,0,70.46

Given the a list of Tank numbers, a primary key, and a data column:

>>>tank_numbers = [1, 3] >>>primary_key = 'Time' >>>data_column = 'Volume' >>> >>>parse_csv('csv_file.csv', tank_numbers, primary_key, data_column) [ {'Time': '2017-10-19T18:52:41.118408', 'UNLEADED': '4406', 'SUPER': '8317'} {'Time': '2017-10-19T18:53:13.894066', 'UNLEADED': '4406', 'SUPER': '8317'} ]

I have a few questions about the following code;

Using only the standard library, is there a more simple way? What I have seems like too much just to get the needed information.
Should I be breaking up the parse_csv function into smaller pieces similar to _parse_csv_to_dicts and _get_tank_names.

import csv def _parse_csv_to_dicts(file): with open(file, 'r') as f: return list(csv.DictReader(f)) def _get_tank_names(tanks=None, data=None): names = list() for n in tanks: for tank_dict in data: if tank_dict['Tank'] == str(n): names.append(tank_dict['Product']) break return names def parse_csv(file, tanks, primary_key, data_key): """ :param file: The raw csv data file :param tanks: A list of tank numbers, as labeled in the raw csv :param key: a list of the keys needed from the raw csv :return: a list of dictionaries """ d1 = _parse_csv_to_dicts(file) # Remove unneeded tanks from data d2 = [row for row in d1 if int(row['Tank']) in tanks] # Remove unneeded keys from rows keys = [primary_key, data_key, 'Product'] d3 = [{key:row[key] for key in keys} for row in d2] # Create new row template tank_names = _get_tank_names(tanks=tanks, data=d1) row_template = {key:None for key in (tank_names + [primary_key])} # New rows from row template d4 = [] for row in d3: # update new row with available keys new_row = {key:row.get(key) for key in row_template} # update new row with matching values for key in new_row: if key in row.values(): new_row[key] = row[data_key] # remove keys with None value new_row = {k:v for k,v in new_row.items() if v is not None} d4.append(new_row) # Merge all rows based on Time key merged = {} for row in d4: if row[primary_key] in merged: merged[row[primary_key]].update(row) else: merged[row[primary_key]] = row return [value for value in merged.values()]

Stephen Rauch · Accepted Answer · 2017-10-22 21:06:31Z

In general I think this code looks excellent. You have made good use of comprehensions. I have two suggestions:

More Comprehensions:

This:

def _get_tank_names(tanks=None, data=None): names = list() for n in tanks: for tank_dict in data: if tank_dict['Tank'] == str(n): names.append(tank_dict['Product']) break return names

Can be expressed as:

def _get_tank_names(tanks=None, data=None): tanks = {str(n) for n in tanks} return {t['Product'] for t in data if t['Tank'] in tanks}

This converts tanks to a set while performing the str() conversion, to clean things up a bit. Then the result can be generated in a single statement.

Gathering like elements

So this entire section:

# Create new row template tank_names = _get_tank_names(tanks=tanks, data=d1) row_template = {key: None for key in (tank_names + [primary_key])} # New rows from row template d4 = [] for row in d3: # update new row with available keys new_row = {key: row.get(key) for key in row_template} # update new row with matching values for key in new_row: if key in row.values(): new_row[key] = row[data_key] # remove keys with None value new_row = {k: v for k, v in new_row.items() if v is not None} d4.append(new_row) # Merge all rows based on Time key merged = {} for row in d4: if row[primary_key] in merged: merged[row[primary_key]].update(row) else: merged[row[primary_key]] = row return [value for value in merged.values()]

Can be reduced down to:

merged = {} for row in d3: merged.setdefault(row['Time'], {'Time': row['Time']}) \ [row['Product']] = row['Volume'] return [v for k, v in sorted(merged.items())]

The cornerstone here is using dict.setdefault to accumulate the records that match the timestamp.

So how does the setdefault work?

We create an empty dict, the keys to which will be the timestamps we will be accumulating data for.

merged = {}

Loop through the data, a row at a time:

for row in d3:

Look up the element in our dict corresponding to the timestamp for this row:

merged.setdefault(row['Time'], {})

This is most of the magic, it will populate merged[this rows timestamp] with an empty dict if one is not already there, and then return the dict, so that we can put something in it. An empty dict would often be sufficient, but in this case you also need the timestamp in the final dict, so we can extend the statement to be:

merged.setdefault(row['Time'], {'Time': row['Time']})

This will return a dict with the timestamp already populated, the first time we see that timestamp. In subsequent loops when we see the timestamp, it will return the dict that was built the previous time we saw the timestamp.

To finish the dict creation code we set the Product and Volume data into the dict with:

merged.setdefault(row['Time'], {'Time': row['Time']}) \ [row['Product']] = row['Volume']

Lastly, return the list of dicts we built, sorted by the timestamp with:

return [v for k, v in sorted(merged.items())]

Ain't Python fun? :-)

Exactly what I was looking for. Thank you for your edit explaining what you were doing with the setdefault method. Definitely a useful tool. Thank you for your time! Yes, it sure is fun! — JohnBoy, CommentedOct 23, 2017 at 13:56

Stack Exchange Network

Parsing a CSV file with Python

1 Answer 1

More Comprehensions:

Gathering like elements

So how does the setdefault work?

Hot Network Questions

Parsing a CSV file with Python

1 Answer 1

More Comprehensions:

Gathering like elements

So how does the setdefault work?

Related

Hot Network Questions