Use Pandas To Group By Column And Then Create A New Column Based On A Condition
I need to reproduce with pandas what SQL does so easily: select del_month , sum(case when off0_on1 = 1 then 1 else 0 end) as on1 , sum(case when off0_on1 = 0 then 1 els
Solution 1:
Simply sum the Trues in your conditional logic expressions:
import pandas as pd
a1 = pd.DataFrame({'del_month':[1,1,1,1,2,2,2,2],
'off0_on1':[0,0,1,1,0,1,1,1]})
a1['on1'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: sum(x==1))
a1['off0'] = a1.groupby('del_month')['off0_on1'].transform(lambda x: sum(x==0))
print(a1)
# del_month off0_on1 on1 off0# 0 1 0 2 2# 1 1 0 2 2# 2 1 1 2 2# 3 1 1 2 2# 4 2 0 3 1# 5 2 1 3 1# 6 2 1 3 1# 7 2 1 3 1
Similarly, you can do the same in SQL if dialect supports it which most should:
select
del_month
, sum(off0_on1 =1) as on1
, sum(off0_on1 =0) as off0
from a1
groupby del_month
orderby del_month
And to replicate above SQL in pandas, don't use transform
but send multiple aggregates in a groupby().apply()
call:
defaggfunc(x):
data = {'on1': sum(x['off0_on1'] == 1),
'off0': sum(x['off0_on1'] == 0)}
return pd.Series(data)
g = a1.groupby('del_month').apply(aggfunc)
print(g)
# on1 off0# del_month # 1 2 2# 2 3 1
Solution 2:
Using get_dummies
would only need a single groupby
call, which is simpler.
v = pd.get_dummies(df.pop('off0_on1')).groupby(df.del_month).transform(sum)
df = pd.concat([df, v.rename({0: 'off0', 1: 'on1'}, axis=1)], axis=1)
df
del_month off0 on1
0 1 2 2
1 1 2 2
2 1 2 2
3 1 2 2
4 2 1 3
5 2 1 3
6 2 1 3
7 2 1 3
Additionally, for the case of aggregation, call sum
directly instead of using apply
:
(pd.get_dummies(df.pop('off0_on1'))
.groupby(df.del_month)
.sum()
.rename({0: 'off0', 1: 'on1'}, axis=1))
off0 on1
del_month
1 2 2
2 1 3
Post a Comment for "Use Pandas To Group By Column And Then Create A New Column Based On A Condition"