Calculating Rolling Average Per Group In Pandas Df
I have a df like this: date car model mpg 1 ford focus 10 1 ford fiesta 15 1 ford mustang 20 2 ford
Solution 1:
For me working reshape value first by DataFrame.set_index
and Series.unstack
, if no match is added 0
, then use rolling
and last reshape back by DataFrame.stack
and add new column by DataFrame.join
:
s = (df.set_index(['date','car','model'])['mpg']
.unstack(fill_value=0)
.rolling(window=2)
.mean()
.stack()
.rename('rolling_avg')
)
df = df.join(s, on=['date','car','model'])
print (df)
date car model mpg rolling_avg
0 1 ford focus 10 NaN
1 1 ford fiesta 15 NaN
2 1 ford mustang 20 NaN
3 2 ford focus 13 11.5
4 2 ford fiesta 16 15.5
5 2 ford mustang 27 23.5
6 3 ford focus 13 13.0
7 3 ford mustang 27 27.0
8 4 ford focus 12 12.5
9 4 ford fiesta 17 8.5
EDIT: If set_index
with unstack
fialed, there are duplicates like:
df = pd.DataFrame({'date':[1,1,1,2,2,2,3,3,4,4],
'car':['ford','ford','ford','ford','ford','ford','ford','ford','ford','ford'],
'model':['focus','focus','mustang','focus','focus','mustang','focus','mustang','focus','fiesta'],
'mpg':[10,15,20,13,16,27,13,27,12,17]})
print (df)
date car model mpg
0 1 ford focus 10 <- dupe 1 ford focus
1 1 ford focus 15 <- dupe 1 ford focus
2 1 ford mustang 20
3 2 ford focus 13 <- dupe 2 ford focus
4 2 ford focus 16 <- dupe 2 ford focus
5 2 ford mustang 27
6 3 ford focus 13
7 3 ford mustang 27
8 4 ford focus 12
9 4 ford fiesta 17
Then if possible first need unique pairs, here by aggregation sum
(or mean
like need):
df1 = df.pivot_table(index=['date','car'],
columns='model',
values='mpg',
aggfunc='sum',
fill_value=0)
print (df1)
model fiesta focus mustang
date car
1 ford 025202 ford 029273 ford 013274 ford 17120
And then is possible use rolling
, output is different like input data, because unique 'date','car','model'
:
df1 = (df1.rolling(window=2)
.mean()
.stack(dropna=False)
.rename('rolling_avg')
.reset_index()
)
print (df1)
date car model rolling_avg
01 ford fiesta NaN
11 ford focus NaN
21 ford mustang NaN
32 ford fiesta 0.042 ford focus 27.052 ford mustang 23.563 ford fiesta 0.073 ford focus 21.083 ford mustang 27.094 ford fiesta 8.5104 ford focus 12.5114 ford mustang 13.5
Post a Comment for "Calculating Rolling Average Per Group In Pandas Df"