Skip to content Skip to sidebar Skip to footer

One-hot Encoding In Scikit-learn For Only Part Of The Dataframe

I am trying to use a decision tree classier on my data which looks very similar to the data in this tutorial: https://www.ritchieng.com/machinelearning-one-hot-encoding/ The tutor

Solution 1:

Actually there is a really simple solution - using pd.get_dummies()

If you have a Data Frame like the following:

so_data = {
    'passenger_id': [1,2,3,4,5],
    'survived': [1,0,0,1,0],
    'age': [24,25,68,39,5],
    'sex': ['female', 'male', 'male', 'female', 'female'],
    'first_name': ['Joanne', 'Mark', 'Josh', 'Petka', 'Ariel']
}
so_df = pd.DataFrame(so_data)

which looks like:

    passenger_id    survived    age   sex       first_name
01124  female        Joanne
12025  male          Mark23068  male          Josh
34139  female        Petka
4505   female        Ariel

You can just do:

pd.get_dummies(so_df)

which will give you:

enter image description here

(sorry for the screenshot, but it's really difficult to clean the df on SO)

Post a Comment for "One-hot Encoding In Scikit-learn For Only Part Of The Dataframe"