Skip to content Skip to sidebar Skip to footer

Calculating Rolling Average Per Group In Pandas Df

I have a df like this: date car model mpg 1 ford focus 10 1 ford fiesta 15 1 ford mustang 20 2 ford

Solution 1:

For me working reshape value first by DataFrame.set_index and Series.unstack, if no match is added 0, then use rolling and last reshape back by DataFrame.stack and add new column by DataFrame.join:

s = (df.set_index(['date','car','model'])['mpg']
        .unstack(fill_value=0)
        .rolling(window=2)
        .mean()
        .stack()
        .rename('rolling_avg')
        )

df = df.join(s, on=['date','car','model'])
print (df)
   date   car    model  mpg  rolling_avg
0     1  ford    focus   10          NaN
1     1  ford   fiesta   15          NaN
2     1  ford  mustang   20          NaN
3     2  ford    focus   13         11.5
4     2  ford   fiesta   16         15.5
5     2  ford  mustang   27         23.5
6     3  ford    focus   13         13.0
7     3  ford  mustang   27         27.0
8     4  ford    focus   12         12.5
9     4  ford   fiesta   17          8.5

EDIT: If set_index with unstack fialed, there are duplicates like:

df = pd.DataFrame({'date':[1,1,1,2,2,2,3,3,4,4],
                   'car':['ford','ford','ford','ford','ford','ford','ford','ford','ford','ford'],
                   'model':['focus','focus','mustang','focus','focus','mustang','focus','mustang','focus','fiesta'],
                   'mpg':[10,15,20,13,16,27,13,27,12,17]})

print (df)
   date   car    model  mpg
0     1  ford    focus   10 <- dupe 1  ford    focus
1     1  ford    focus   15 <- dupe 1  ford    focus
2     1  ford  mustang   20
3     2  ford    focus   13 <- dupe 2  ford    focus
4     2  ford    focus   16 <- dupe 2  ford    focus
5     2  ford  mustang   27
6     3  ford    focus   13
7     3  ford  mustang   27
8     4  ford    focus   12
9     4  ford   fiesta   17

Then if possible first need unique pairs, here by aggregation sum (or mean like need):

df1 = df.pivot_table(index=['date','car'], 
                     columns='model', 
                     values='mpg', 
                     aggfunc='sum', 
                     fill_value=0)
print (df1)
model      fiesta  focus  mustang
date car                         
1    ford       025202    ford       029273    ford       013274    ford      17120

And then is possible use rolling, output is different like input data, because unique 'date','car','model':

df1 = (df1.rolling(window=2)
        .mean()
        .stack(dropna=False)
        .rename('rolling_avg')
        .reset_index()
        )

print (df1)
  
    date   car    model  rolling_avg
01  ford   fiesta          NaN
11  ford    focus          NaN
21  ford  mustang          NaN
32  ford   fiesta          0.042  ford    focus         27.052  ford  mustang         23.563  ford   fiesta          0.073  ford    focus         21.083  ford  mustang         27.094  ford   fiesta          8.5104  ford    focus         12.5114  ford  mustang         13.5

Post a Comment for "Calculating Rolling Average Per Group In Pandas Df"