Pandas Expanding Mean With Group By And Before Current Row Date
I have a Pandas dataframe as follows df = pd.DataFrame([['John', '1/1/2017','10'], ['John', '2/2/2017','15'], ['John', '2/2/2017','20'],
Solution 1:
instead of grouping & expanding the mean, filter the dataframe on the conditions, and calculate the mean of DPD
:
Customer
== current row'sCustomer
Deposit_Date
< current row'sDeposit_Date
Use df.apply
to perform this operation for all row in the dataframe:
df['PreviousMean'] = df.apply(
lambda x: df[(df.Customer == x.Customer) & (df.Deposit_Date < x.Deposit_Date)].DPD.mean(),
axis=1)
outputs:
CustomerDeposit_DateDPDPreviousMean0John2017-01-01 10NaN1John2017-02-02 1510.02John2017-02-02 2010.03John2017-03-03 3015.04Sue2017-01-01 10NaN5Sue2017-02-02 1510.06Sue2017-03-02 2012.57Sue2017-03-03 715.08Sue2017-04-04 2013.0
Solution 2:
Here's one way to exclude repeated days from mean calculation:
# create helper series which is NaN for repeated days, DPD otherwise
s = df.groupby(['Customer Name', 'Deposit_Date']).cumcount() == 1
df['DPD2'] = np.where(s, np.nan, df['DPD'])
# apply pd.expanding_meandf['CumMean'] = df.groupby(['Customer Name'])['DPD2'].apply(lambda x: pd.expanding_mean(x))
# drop helper seriesdf = df.drop('DPD2', 1)
print(df)
Customer Name Deposit_Date DPD CumMean
0 John 01/01/2017 10 10.0
1 John 01/01/2017 10 10.0
2 John 02/02/2017 20 15.0
3 John 03/03/2017 30 20.0
4 Sue 01/01/2017 10 10.0
5 Sue 01/01/2017 10 10.0
6 Sue 02/02/2017 20 15.0
7 Sue 03/03/2017 30 20.0
Solution 3:
Ok here is the best solution I've come up with thus far.
The trick is to first create an aggregated table at the customer & deposit date level containing a shifted mean. To calculate this mean you have to calculate the sum and the count first.
s=df.groupby(['Customer Name','Deposit_Date'],as_index=False)[['DPD']].agg(['count','sum'])
s.columns = [' '.join(col) for col in s.columns]
s.reset_index(inplace=True)
s['DPD_CumSum']=s.groupby(['Customer Name'])[['DPD sum']].cumsum()
s['DPD_CumCount']=s.groupby(['Customer Name'])[['DPD count']].cumsum()
s['DPD_CumMean']=s['DPD_CumSum']/ s['DPD_CumCount']
s['DPD_PrevMean']=s.groupby(['Customer Name'])['DPD_CumMean'].shift(1)
df=df.merge(s[['Customer Name','Deposit_Date','DPD_PrevMean']],how='left',on=['Customer Name','Deposit_Date'])
Post a Comment for "Pandas Expanding Mean With Group By And Before Current Row Date"