One-hot Encoding In Scikit-learn For Only Part Of The Dataframe
I am trying to use a decision tree classier on my data which looks very similar to the data in this tutorial: https://www.ritchieng.com/machinelearning-one-hot-encoding/ The tutor
Solution 1:
Actually there is a really simple solution - using pd.get_dummies()
If you have a Data Frame like the following:
so_data = {
'passenger_id': [1,2,3,4,5],
'survived': [1,0,0,1,0],
'age': [24,25,68,39,5],
'sex': ['female', 'male', 'male', 'female', 'female'],
'first_name': ['Joanne', 'Mark', 'Josh', 'Petka', 'Ariel']
}
so_df = pd.DataFrame(so_data)
which looks like:
passenger_id survived age sex first_name
01124 female Joanne
12025 male Mark23068 male Josh
34139 female Petka
4505 female Ariel
You can just do:
pd.get_dummies(so_df)
which will give you:
(sorry for the screenshot, but it's really difficult to clean the df on SO)
Post a Comment for "One-hot Encoding In Scikit-learn For Only Part Of The Dataframe"