Filtering Pandas Dataframe Aggregate

May 26, 2024 Post a Comment

I have a pandas dataframe that I groupby, and then perform an aggregate calculation to get the mean for: grouped = df.groupby(['year_month', 'company']) means = grouped.agg({'size'

Solution 1:

The issue is that you are grouping based on 'year_month' and 'company' . Hence in the means DataFrame, year_month and company would be part of the index (MutliIndex). You cannot access them as you access other columns.

One method to do this would be to get the values of the level 'year_month' of index . Example -

means.loc[means.index.get_level_values('year_month') == '201412']

Demo -

In [38]: df
Out[38]:
   A  B   C
012101341125612317134281451915

In [39]: means = df.groupby(['A','B']).mean()

In [40]: means
Out[40]:
      C
A B
1210713915281434115612

In [41]: means.loc[means.index.get_level_values('A') == 1]
Out[41]:
      C
A B
1210713915

Solution 2:

As already pointed out, you will end up with a 2 level index. You could try to unstack the aggregated dataframe:

means = df.groupby(['year_month', 'company']).agg({'size':['mean']}).unstack(level=1)

This should give you a single 'year_month' index, 'company' as columns and your aggregate size as values. You can then slice by the index:

means.loc['201412']

Python Playground

Filtering Pandas Dataframe Aggregate

Solution 1:

Solution 2:

Post a Comment for "Filtering Pandas Dataframe Aggregate"