How Can I Resample Pandas Dataframe By Day On Period Time?
i have a dataframe like this: df.head() Out[2]: price sale_date 0 477,000,000 1396/10/30 1 608,700,000 1396/10/30 2 580,000,000 1396/10/03 3 350,000,000 139
Solution 1:
It seems here not working resample
and Grouper
with Periods
for me in pandas 1.1.3 (I guess bug):
df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)
def conv(x):
return pd.Period(year=x //10000,
month=x //100%100,
day=x %100, freq='D')
df['sale_date'] = df['sale_date'].apply(conv)
# df = df.set_index('sale_date').resample('D')['price'].sum()
#OutOfBoundsDatetime: Outof bounds nanosecond timestamp: 1396-03-1800:00:00
# df = df.set_index('sale_date').groupby(pd.Grouper(freq='D'))['price'].sum()
#OutOfBoundsDatetime: Outof bounds nanosecond timestamp: 1396-03-1800:00:00
Possible solution is aggregate by sum
, so if duplicated sale_date
then price
values are summed:
df = df.groupby('sale_date')['price'].sum().reset_index()
print (df)
sale_date price
0 1396-03-18 328000000
1 1396-10-03 580000000
2 1396-10-30 477000000
3 1396-11-25 608700000
4 1396-12-05 350000000
EDIT: It is possible by Series.reindex
with period_range
:
s=df.groupby('sale_date')['price'].sum()rng=pd.period_range(s.index.min(),s.index.max(),name='sale_date')df=s.reindex(rng,fill_value=0).reset_index()print(df)sale_dateprice01396-03-18 32800000011396-03-19 021396-03-20 031396-03-21 041396-03-22 0........2581396-12-01 02591396-12-02 02601396-12-03 02611396-12-04 02621396-12-05 350000000
[263rowsx2columns]
Post a Comment for "How Can I Resample Pandas Dataframe By Day On Period Time?"