How Can I Resample Pandas Dataframe By Day On Period Time?

July 02, 2024 Post a Comment

i have a dataframe like this: df.head() Out[2]: price sale_date 0 477,000,000 1396/10/30 1 608,700,000 1396/10/30 2 580,000,000 1396/10/03 3 350,000,000 139

Solution 1:

It seems here not working resample and Grouper with Periods for me in pandas 1.1.3 (I guess bug):

df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)

def conv(x):
    return pd.Period(year=x //10000,
                     month=x //100%100,
                     day=x %100, freq='D')
 
df['sale_date'] = df['sale_date'].apply(conv)

# df = df.set_index('sale_date').resample('D')['price'].sum()
#OutOfBoundsDatetime: Outof bounds nanosecond timestamp: 1396-03-1800:00:00

# df = df.set_index('sale_date').groupby(pd.Grouper(freq='D'))['price'].sum()
#OutOfBoundsDatetime: Outof bounds nanosecond timestamp: 1396-03-1800:00:00

Possible solution is aggregate by sum, so if duplicated sale_date then price values are summed:

df = df.groupby('sale_date')['price'].sum().reset_index()
print (df)
    sale_date      price
0  1396-03-18  328000000
1  1396-10-03  580000000
2  1396-10-30  477000000
3  1396-11-25  608700000
4  1396-12-05  350000000

EDIT: It is possible by Series.reindex with period_range:

s=df.groupby('sale_date')['price'].sum()rng=pd.period_range(s.index.min(),s.index.max(),name='sale_date')df=s.reindex(rng,fill_value=0).reset_index()print(df)sale_dateprice01396-03-18  32800000011396-03-19          021396-03-20          031396-03-21          041396-03-22          0........2581396-12-01          02591396-12-02          02601396-12-03          02611396-12-04          02621396-12-05  350000000

[263rowsx2columns]

Python Playground

How Can I Resample Pandas Dataframe By Day On Period Time?

Solution 1:

Post a Comment for "How Can I Resample Pandas Dataframe By Day On Period Time?"