Pandas: Add Data For Missing Months
I have a dataframe of sales information by customers by month period, that looks something like this, with multiple customers and varying month periods and spend: customer_id
Solution 1:
Something like this; note that the filling the customer_id is not defined (as you probably have this in a groupby or something).
You may need a reset_index
at the end (if desired)
In [130]: df2 = df.set_index('month_year')
In [131]: df2 = df2.sort_index()
In [132]: df2
Out[132]:
customer_id sales
month_year
2011-07 12 33.14
2011-11 12 182.06
2012-01 12 71.24
2012-03 12 155.32
2012-05 12 2.58
In [133]: df2.reindex(pd.period_range(df2.index[0],df2.index[-1],freq='M'))
Out[133]:
customer_id sales
2011-07 12 33.14
2011-08 NaN NaN
2011-09 NaN NaN
2011-10 NaN NaN
2011-11 12 182.06
2011-12 NaN NaN
2012-01 12 71.24
2012-02 NaN NaN
2012-03 12 155.32
2012-04 NaN NaN
2012-05 12 2.58
In [135]: df2['customer_id'] = 12
In [136]: df2.fillna(0.0)
Out[136]:
customer_id sales
2011-07 12 33.14
2011-08 12 0.00
2011-09 12 0.00
2011-10 12 0.00
2011-11 12 182.06
2011-12 12 0.00
2012-01 12 71.24
2012-02 12 0.00
2012-03 12 155.32
2012-04 12 0.00
2012-05 12 2.58
Solution 2:
I found a different way to fill in missing months (they will be filled with NaN), while also accounting for multiple possible customers.
df = df.set_index(['month_year', 'customer_id'])['sales'].unstack().unstack().reset_index()
df = df.rename(columns={0:'sales'})
While this is absolutley unelegant, it gets the job done.
Post a Comment for "Pandas: Add Data For Missing Months"