Mapping column names and values of a csv using another csv

Question

I have two csv files, pricat.csv which contains objects I need to populate my DB with, and mapping.csv which specifies how the value in pricat.csv must be displayed in my DB, for ex: 'NW 17-18' in pricat.csv has to be 'Winter Collection 2017/2018' in my DB. Here the csvs, first row in both are the headers:

ean;supplier;brand;catalog_code;collection;season;article_structure_code;article_number;article_number_2;article_number_3;color_code;size_group_code;size_code;size_name;currency;price_buy_gross;price_buy_net;discount_rate;price_sell;material;target_area 8719245200978;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;38;38;EUR;;58.5;;139.95;Aviation;Woman Shoes 8719245200985;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;39;39;EUR;;58.5;;139.95;Aviation;Woman Shoes

source;destination;source_type;destination_type winter;Winter;season;season summer;Summer;season;season NW 17-18;Winter Collection 2017/2018;collection;collection EU;European sizes;size_group_code;size_group EU|36;European size 36;size_group_code|size_code;size EU|37;European size 37;size_group_code|size_code;size EU|38;European size 38;size_group_code|size_code;size EU|39;European size 39;size_group_code|size_code;size EU|40;European size 40;size_group_code|size_code;size EU|41;European size 41;size_group_code|size_code;size EU|42;European size 42;size_group_code|size_code;size 4;Boot;article_structure_code;article_structure 5;Sneaker;article_structure_code;article_structure 6;Slipper;article_structure_code;article_structure 7;Loafer;article_structure_code;article_structure 8;Mocassin;article_structure_code;article_structure 9;Sandal;article_structure_code;article_structure 10;Pump;article_structure_code;article_structure 1;Nero;color_code;color 2;Marrone;color_code;color 3;Brandy Nero;color_code;color 4;Indaco Nero;color_code;color 5;Fucile;color_code;color 6;Bosco Nero;color_code;color

In my models.py in Django I have three models: Catalog --> Article --> Variation the attributes of my models are manually named as mapping.csv specifies, for ex: Variation will not have a color_code attribute but color. To populate the DB I've created a custom Django command which reads the rows in pricat.csv and create istances like this:

x = Catalog.objects.get_or_create(brand=info[2], supplier=info[1], catalog_code=info[3], collection=map_dict[info[4]], season=map_dict[info[5]], size_group=map_dict[info[11]], currency=info[14], target_area=info[20]) y = Article.objects.get_or_create(article_structure=map_dict[info[6]], article_number=info[7], catalog=x[0]) z = Variation.objects.get_or_create(ean=info[0], article=y[0], size_code=info[12], color=map_col[info[10]], material=info[19], price_buy_gross=info[15], price_buy_net=info[16], discount_rate=info[17], price_sell=info[18], size=f'{map_dict[info[11]]} {info[12]}')

info is a list of all the value in a pricat.csv row and map_dict and map_col are two dictionaries I create with two func() from the mapping.csv:

def mapping(map_file): with open(map_file, 'r') as f: f = [l.strip('\n') for l in f] map_dict = {} for l in f[1:19]: info = l.strip().split(';') source = info[0] destination = info[1] source_type = info[2] destination_type = info[3] map_dict[source] = destination map_dict[source_type] = destination_type return map_dict def mapping_color(map_file): with open(map_file, 'r') as f: f = [l.strip('\n') for l in f] map_dict = {} for l in f[19:]: info = l.strip().split(';') source = info[0] destination = info[1] source_type = info[2] destination_type = info[3] map_dict[source] = destination map_dict[source_type] = destination_type return map_dict map_dict = mapping('mapping.csv') map_col = mapping_color('mapping.csv')

I had to create two dict because a single one would have duplicate keys.

The code works fine and the DB is populated as intended, but I feel the way I did the mapping is bad practice, also both my command and funcs relies on indeces so the values in my csvs have to be in that specific order to work. I would greatly appreciate any suggestion on how to improve my code or accomplish this task, I hope my explanation is clear.

EDIT:

class Catalog(models.Model): brand = models.CharField(max_length=255) supplier = models.CharField(max_length=255) catalog_code = models.CharField(max_length=255, default=1, blank=True) collection = models.CharField(max_length=255) season = models.CharField(max_length=255) size_group = models.CharField(max_length=2) currency = models.CharField(max_length=3) target_area = models.CharField(max_length=255) def __str__(self): return self.brand def get_articles(self): return Article.objects.filter(catalog=self.pk) class Article(models.Model): article_structure = models.CharField(max_length=255) article_number = models.CharField(max_length=255) catalog = models.ForeignKey(Catalog, on_delete=models.CASCADE) def __str__(self): return f'{self.article_number} | {self.article_structure}' class Variation(models.Model): ean = models.CharField(max_length=255) article = models.ForeignKey(Article, on_delete=models.CASCADE) size_code = models.IntegerField() size = models.CharField(max_length=255, default=0) color = models.CharField(max_length=255) material = models.CharField(max_length=255) price_buy_gross = models.CharField(max_length=255) price_buy_net = models.FloatField() discount_rate = models.CharField(max_length=255, default=0) price_sell = models.FloatField() def __str__(self): return f'Ean: {self.ean}, article: {self.article}'

I've created a new mapping()

def mapping(map_file): with open(map_file, 'r') as f: f = [l.strip('\n') for l in f] map_dict = {} for l in f[1:]: info = l.strip().split(';') source = info[0] destination = info[1] source_type = info[2] child_dict = {source: destination} map_dict[source_type] = map_dict.get(source_type, {source: destination}) map_dict[source_type].update(child_dict) return map_dict

It returns a nested dict, I'm trying to finda solution using this single nested dict instead of 2 dicts like before.

Please show more of your code, particularly the model classes for Catalog --> Article --> Variation. — Reinderien, CommentedNov 21, 2020 at 17:13

PythonSherpa · Accepted Answer · 2020-11-22 20:40:00Z

You can use the built-in csv.DictReader to easily create dictionaries from CSV files. How about this?

import csv def create_mapping(map_file): with open(map_file) as csvfile: reader = csv.DictReader(csvfile, delimiter=';') mapping = {row['source']: row['destination'] for row in reader if row['source_type'] != 'color_code'} return mapping map_dict = create_mapping('mapping.csv')

We are using dictionary comprehension to create the dictionary. You can do something similar for colors, then you want to have all the rows where source_type equals color_code (so == instead of !=). But perhaps it is a better idea put the color mappings into a different file. Furthermore, if you process the pricat.csv in a similar fashion:

with open('pricat.csv') as csvfile: reader = csv.DictReader(csvfile, delimiter=';') for row in reader: # process row

You'll be able to use the rows as dictionaries:

{'ean': '8719245200985', 'supplier': 'Rupesco BV', 'brand': 'Via Vai', 'catalog_code': '', 'collection': 'NW 17-18', 'season': 'winter', 'article_structure_code': '10', 'article_number': '15189-02', 'article_number_2': '15189-02 Aviation Nero', 'article_number_3': 'Aviation', 'color_code': '1', 'size_group_code': 'EU', 'size_code': '39', 'size_name': '39', 'currency': 'EUR', 'price_buy_gross': '', 'price_buy_net': '58.5', 'discount_rate': '', 'price_sell': '139.95', 'material': 'Aviation', 'target_area': 'Woman Shoes'}

So you can do something like:

y = Article.objects.get_or_create(article_structure=map_dict[row['article_structure_code']], article_number=row['article_number'], catalog=x[0])

This can still be refactored a bit, but now you are no longer dependent on the column numbers.

Hi! I changed my mapping() (let me know what you think about it) but your use of csv.DictReader gave me a great insight and I think I'm going to come up with a very nice solution soon, if it works I'll accept your answer. — Adamantoisetortoise, CommentedNov 22, 2020 at 16:30

Vogel612 · Accepted Answer · 2020-11-22 23:26:04Z

class Command(BaseCommand): help = 'Create a catalog, accept csv as argument' def add_arguments(self, parser): parser.add_argument('file', nargs='+', type=str) parser.add_argument('map', nargs='+', type=str) def handle(self, *args, **options): map_dict = mapping(options['map'][0]) with open(options['file'][0], 'r') as f: reader = csv.DictReader(f, delimiter=';') for row in reader: x = Catalog.objects.get_or_create(brand=row['brand'], supplier=row['supplier'], catalog_code=row['catalog_code'], collection=map_dict['collection'][row['collection']], season=map_dict['season'][row['season']], size_group=map_dict['size_group_code'][row['size_group_code']], currency=row['currency'], target_area=row['target_area']) if x[1]: logger_catalog.info(f'Created Catalog instance {x[0]}') y = Article.objects.get_or_create(article_structure=map_dict['article_structure_code'][row['article_structure_code']], article_number=row['article_number'], catalog=x[0]) if y[1]: logger_catalog.info(f'Created Article instance {y[0]}') z = Variation.objects.get_or_create(ean=row['ean'], article=y[0], size_code=row['size_code'], color=map_dict['color_code'][row['color_code']], material=row['material'], price_buy_gross=row['price_buy_gross'], price_buy_net=row['price_buy_net'], discount_rate=row['discount_rate'], price_sell=row['price_sell'], size=map_dict['size_group_code|size_code'][f"{row['size_group_code']}|{row['size_code']}"]) if z[1]: logger_catalog.info(f'Created Variation instance {z[0]}')

Finally I remake my Command, now it's indipendent from indeces so it will correctly populate the database even if the colums in the csv are in a different order.

My mapping() func (see question) returns a nested dict, the keys of the parent dict are the columns names that need to be mapped and the values are dicts with this structure:
{value_presented_in_csv: how_value_should_be_presented_in_DB}.

In my Command I iterate through each row of pricat.csv turning rows in dicts {colum_name: value_presented_in_csv}, if the data don't need to be mapped I get the value from my row dict like brand=row['brand'], if the data need to be mapped I get the value from my nested dict map_dict like this map_dict[column_name][value_presented_in_csv] (this gives me the value of the child dict that is how_value_should_be_presented_in_DB).

It is better because doesn't relies on indeces no more, my first implementation works correctly only if the columns in pricat.csv are in that precise order; with this new implementation the columns can be in any order and the DB would still be populate correctly.

Stack Exchange Network

Mapping column names and values of a csv using another csv

2 Answers 2

Hot Network Questions

Mapping column names and values of a csv using another csv

2 Answers 2

Related

Hot Network Questions