Skip to content Skip to sidebar Skip to footer

Is There A Faster Way To Update Dataframe Column Values Based On Conditions?

I am trying to process a dataframe. This includes creating new columns and updating their values based on the values in other columns. More concretely, I have a predefined 'source'

Solution 1:

Another way to do this is to use pd.get_dummies on the dataframe. First put '_id' into the index.

source = source.set_index('_id')
df_out = pd.get_dummies(source).reset_index()

print(df_out)

Output:

                    _id  source_Cash 1  source_DTOT  source_DTP
0AV4MdG6Ihowv-SKBN_nB0011AV4Mc2vNhowv-SKBN_Rn1002AV4MeisikOpWpLdepWy60013AV4MeRh6howv-SKBOBOn1004AV4Mezwchowv-SKBOB_S0105AV4MeB7yhowv-SKBOA5b001

Solution 2:

You can use str.get_dummies to get your OHEncodings.

c = df.source.str.get_dummies().add_prefix('source_').iloc[:, ::-1]
c.columns = c.columns.str.lower().str.split().str[0]
print(c)
   source_dtp  source_dtot  source_cash
010010012100300140105100

Next, concatenate c with _id using pd.concat.

df = pd.concat([df._id, c], 1)
print(df)
                    _id  source_dtp  source_dtot  source_cash
0AV4MdG6Ihowv-SKBN_nB1001AV4Mc2vNhowv-SKBN_Rn0012AV4MeisikOpWpLdepWy61003AV4MeRh6howv-SKBOBOn0014AV4Mezwchowv-SKBOB_S0105AV4MeB7yhowv-SKBOA5b100

Improvement! Now slightly smoother, thanks to Scott Boston's set_index - reset_index paradigm:

df = df.set_index('_id')\
      .source.str.get_dummies().iloc[:, ::-1]
df.columns = df.columns.str.lower().str.split().str[0]
df = df.add_prefix('source_').reset_index()

print(df)
                    _id  source_dtp  source_dtot  source_cash
0AV4MdG6Ihowv-SKBN_nB1001AV4Mc2vNhowv-SKBN_Rn0012AV4MeisikOpWpLdepWy61003AV4MeRh6howv-SKBOBOn0014AV4Mezwchowv-SKBOB_S0105AV4MeB7yhowv-SKBOA5b100

Post a Comment for "Is There A Faster Way To Update Dataframe Column Values Based On Conditions?"