Skip to content Skip to sidebar Skip to footer

Use Pandas To Group By Column And Then Create A New Column Based On A Condition

I need to reproduce with pandas what SQL does so easily: select del_month , sum(case when off0_on1 = 1 then 1 else 0 end) as on1 , sum(case when off0_on1 = 0 then 1 els

Solution 1:

Simply sum the Trues in your conditional logic expressions:

import pandas as pd

a1 = pd.DataFrame({'del_month':[1,1,1,1,2,2,2,2], 
                   'off0_on1':[0,0,1,1,0,1,1,1]})

a1['on1'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: sum(x==1))    
a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: sum(x==0))

print(a1)    
#    del_month  off0_on1  on1  off0# 0          1         0    2     2# 1          1         0    2     2# 2          1         1    2     2# 3          1         1    2     2# 4          2         0    3     1# 5          2         1    3     1# 6          2         1    3     1# 7          2         1    3     1

Similarly, you can do the same in SQL if dialect supports it which most should:

select
    del_month
    , sum(off0_on1 =1) as on1
    , sum(off0_on1 =0) as off0
from a1
groupby del_month
orderby del_month

And to replicate above SQL in pandas, don't use transform but send multiple aggregates in a groupby().apply() call:

defaggfunc(x):
    data = {'on1': sum(x['off0_on1'] == 1),
            'off0': sum(x['off0_on1'] == 0)}

    return pd.Series(data)

g = a1.groupby('del_month').apply(aggfunc)

print(g)    
#            on1  off0# del_month           # 1            2     2# 2            3     1

Solution 2:

Using get_dummies would only need a single groupby call, which is simpler.

v = pd.get_dummies(df.pop('off0_on1')).groupby(df.del_month).transform(sum)
df = pd.concat([df, v.rename({0: 'off0', 1: 'on1'}, axis=1)], axis=1)

df
   del_month  off0  on1
0          1     2    2
1          1     2    2
2          1     2    2
3          1     2    2
4          2     1    3
5          2     1    3
6          2     1    3
7          2     1    3

Additionally, for the case of aggregation, call sum directly instead of using apply:

(pd.get_dummies(df.pop('off0_on1'))
   .groupby(df.del_month)
   .sum()
   .rename({0: 'off0', 1: 'on1'}, axis=1))

           off0  on1
del_month           
1             2    2
2             1    3

Post a Comment for "Use Pandas To Group By Column And Then Create A New Column Based On A Condition"