Skip to content Skip to sidebar Skip to footer

Include Missing Group Keys As Nan In Pandas Groupby Output

I have a dataframe in pandas. test_df = pd.DataFrame({'date': ['2018-12-28', '2018-12-28', '2018-12-29', '2018-12-29', '2018-12-30', '2018-12-30'], 'transact

Solution 1:

This is easy if you convert "transaction" to a categorical column before grouping,

df.transaction = pd.Categorical(df.transaction)
df.groupby(['date','transaction','ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN0.404488
           bb           0.459295NaN
           cc                NaNNaN2018-12-29 aa                NaN0.439354
           bb                NaNNaN
           cc           0.429269NaN2018-12-30 aa                NaNNaN
           bb                NaN1.542451
           cc                NaNNaN

Missing categories in the output are represented by NaNs. This is usually possible when performing numeric aggregation.


If you don't want to modify df, this will do:

u = pd.Series(pd.Categorical(df.transaction), name='transaction')
df.groupby(['date', u,'ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN0.429134
           bb           0.852355NaN
           cc                NaNNaN2018-12-29 aa                NaN0.541576
           bb                NaNNaN
           cc           0.994095NaN2018-12-30 aa                NaNNaN
           bb                NaN0.744587
           cc                NaNNaN

Post a Comment for "Include Missing Group Keys As Nan In Pandas Groupby Output"