I want to convert String data to Numeric data as the Decision tree is only accepting numeric data. When I had Binary String data like Ever_Married[Yes/No] I converted using the .replace
method to Numeric data. But now I have an attribute with 5 different options[Private, Self-employed, Children, Govt_job, Never_worked]. Is it okay to use .replace
to map these attributes to five different Numeric values? will it affect my model and is this good practice?
$\begingroup$$\endgroup$
2- 1$\begingroup$How is “ever married” continuous? Likewise, how is your five-category employment variable continuous?$\endgroup$– DaveCommentedNov 24, 2022 at 17:01
- $\begingroup$Ohh sorry for the mistake, ever_married and employment attributes were String and I wanted to convert them to Numeric variables. Because an error was coming the decision tree cannot take string variables. I will edit the question.$\endgroup$– Anantashayana HegdeCommentedNov 24, 2022 at 17:50
Add a comment |
1 Answer
$\begingroup$$\endgroup$
Since you tagged scikit-learn
, then you can use its function preprocessing.LabelEncoder()
to convert categories to numerical values. And yes, this is a good practice.
from sklearn import preprocessing label_encoder = preprocessing.LabelEncoder() label_encoder.fit(my_dataframe["status"])