My goal is to parse this CSV data:
Time,Tank,Product,Volume,TC Vol,Ullage,90 Ul,Height,Water,Temp 2017-10-19T18:52:41.118408,1,UNLEADED,4406,4393,7221,6058.3,37.49,0,64.15 2017-10-19T18:52:41.118408,3,SUPER,8317,8278,3310,2147.3,61.4,0,66.74 2017-10-19T18:52:41.118408,4,ADSL2,6807,6774,4820,3657.3,51.98,0,70.46 2017-10-19T18:53:13.894066,1,UNLEADED,4406,4393,7221,6058.3,37.49,0,64.15 2017-10-19T18:53:13.894066,3,SUPER,8313,8273,3314,2151.3,61.37,0,66.74 2017-10-19T18:53:13.894066,4,ADSL2,6805,6772,4822,3659.3,51.97,0,70.46
Given the a list of Tank numbers, a primary key, and a data column:
>>>tank_numbers = [1, 3] >>>primary_key = 'Time' >>>data_column = 'Volume' >>> >>>parse_csv('csv_file.csv', tank_numbers, primary_key, data_column) [ {'Time': '2017-10-19T18:52:41.118408', 'UNLEADED': '4406', 'SUPER': '8317'} {'Time': '2017-10-19T18:53:13.894066', 'UNLEADED': '4406', 'SUPER': '8317'} ]
I have a few questions about the following code;
- Using only the standard library, is there a more simple way? What I have seems like too much just to get the needed information.
- Should I be breaking up the
parse_csv
function into smaller pieces similar to_parse_csv_to_dicts
and_get_tank_names
.
import csv def _parse_csv_to_dicts(file): with open(file, 'r') as f: return list(csv.DictReader(f)) def _get_tank_names(tanks=None, data=None): names = list() for n in tanks: for tank_dict in data: if tank_dict['Tank'] == str(n): names.append(tank_dict['Product']) break return names def parse_csv(file, tanks, primary_key, data_key): """ :param file: The raw csv data file :param tanks: A list of tank numbers, as labeled in the raw csv :param key: a list of the keys needed from the raw csv :return: a list of dictionaries """ d1 = _parse_csv_to_dicts(file) # Remove unneeded tanks from data d2 = [row for row in d1 if int(row['Tank']) in tanks] # Remove unneeded keys from rows keys = [primary_key, data_key, 'Product'] d3 = [{key:row[key] for key in keys} for row in d2] # Create new row template tank_names = _get_tank_names(tanks=tanks, data=d1) row_template = {key:None for key in (tank_names + [primary_key])} # New rows from row template d4 = [] for row in d3: # update new row with available keys new_row = {key:row.get(key) for key in row_template} # update new row with matching values for key in new_row: if key in row.values(): new_row[key] = row[data_key] # remove keys with None value new_row = {k:v for k,v in new_row.items() if v is not None} d4.append(new_row) # Merge all rows based on Time key merged = {} for row in d4: if row[primary_key] in merged: merged[row[primary_key]].update(row) else: merged[row[primary_key]] = row return [value for value in merged.values()]