Python - Create many dummy variables from one text variable?

Question

I'm trying to create dummy variables for a variable that has text data in rows.

Data in 1st row is:
{"Wireless Internet","Air conditioning",Kitchen,Heating,"Family/kid friendly",Essentials,"Hair dryer",Iron,"translation missing: en.hosting_amenity_50"}

and Data in 2nd row is:
{TV,"Cable TV",Internet,"Wireless Internet",Kitchen,"Indoor fireplace","Buzzer/wireless intercom",Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit","Fire extinguisher",Essentials} and many more.

What I now want to do is, to create dummy variables out of that variable. For example from the above data:
one variable named Wireless Internet with 0 ans 1 in rows &
another variable named Cable TV with 0 and 1 in rows &
another variable named Kitchen with 0 and 1 in rows and so on.

sklearn for python has OneHotEncoder class which creates dummy variable named everything in a row considering all rows with unique values. That is not what I want to do here. I first have to split text in all rows and create dummy variables for them. How do I do that?

Expected results are, multiple columns like
Wireless Internet Cable TV Kitchen
1 0 1
0 1 1
1 0 1

link to data(column named amenities) - https://www.kaggle.com/stevezhenghp/airbnb-price-prediction

georg.dev · Accepted Answer · 2019-04-05 07:38:26Z

On the kaggle page, there is a kernel available which focuses on exactly your problem. This is the code from user JAGADEESHWARA VARA PRASAD:

# load your dataset into df df = pd.read_csv("../input/train.csv") # trimm { } symbols and split at ',' and trimm " from each word l = [[word.strip('[" ]') for word in row[1:-1].split(',')] for row in list(df['amenities'])] # form a set of distinct text cols = set(word for row in l for word in row) cols.remove('') # create and fill new data frame new_df = pd.DataFrame(columns=cols) for row_idx in range(len(l)): for col in cols: new_df.loc[row_idx,col]=int(col in l[row_idx]) print(new_df)

Stack Exchange Network

Python - Create many dummy variables from one text variable?

1 Answer 1

Hot Network Questions

Python - Create many dummy variables from one text variable?

1 Answer 1

Related

Hot Network Questions