Filtering Pandas Dataframe Aggregate
I have a pandas dataframe that I groupby, and then perform an aggregate calculation to get the mean for: grouped = df.groupby(['year_month', 'company']) means = grouped.agg({'size'
Solution 1:
The issue is that you are grouping based on 'year_month'
and 'company'
. Hence in the means
DataFrame, year_month
and company
would be part of the index (MutliIndex). You cannot access them as you access other columns.
One method to do this would be to get the values of the level 'year_month'
of index . Example -
means.loc[means.index.get_level_values('year_month') == '201412']
Demo -
In [38]: df
Out[38]:
A B C
012101341125612317134281451915
In [39]: means = df.groupby(['A','B']).mean()
In [40]: means
Out[40]:
C
A B
1210713915281434115612
In [41]: means.loc[means.index.get_level_values('A') == 1]
Out[41]:
C
A B
1210713915
Solution 2:
As already pointed out, you will end up with a 2 level index. You could try to unstack the aggregated dataframe:
means = df.groupby(['year_month', 'company']).agg({'size':['mean']}).unstack(level=1)
This should give you a single 'year_month' index, 'company' as columns and your aggregate size as values. You can then slice by the index:
means.loc['201412']
Post a Comment for "Filtering Pandas Dataframe Aggregate"