Skip to content Skip to sidebar Skip to footer

Pandas: Add Data For Missing Months

I have a dataframe of sales information by customers by month period, that looks something like this, with multiple customers and varying month periods and spend: customer_id

Solution 1:

Something like this; note that the filling the customer_id is not defined (as you probably have this in a groupby or something).

You may need a reset_index at the end (if desired)

In [130]: df2 = df.set_index('month_year')

In [131]: df2 = df2.sort_index()

In [132]: df2
Out[132]: 
            customer_id   sales
month_year                     
2011-07              12   33.14
2011-11              12  182.06
2012-01              12   71.24
2012-03              12  155.32
2012-05              12    2.58

In [133]: df2.reindex(pd.period_range(df2.index[0],df2.index[-1],freq='M'))
Out[133]: 
         customer_id   sales
2011-07           12   33.14
2011-08          NaN     NaN
2011-09          NaN     NaN
2011-10          NaN     NaN
2011-11           12  182.06
2011-12          NaN     NaN
2012-01           12   71.24
2012-02          NaN     NaN
2012-03           12  155.32
2012-04          NaN     NaN
2012-05           12    2.58

In [135]: df2['customer_id'] = 12

In [136]: df2.fillna(0.0)
Out[136]: 
         customer_id   sales
2011-07           12   33.14
2011-08           12    0.00
2011-09           12    0.00
2011-10           12    0.00
2011-11           12  182.06
2011-12           12    0.00
2012-01           12   71.24
2012-02           12    0.00
2012-03           12  155.32
2012-04           12    0.00
2012-05           12    2.58

Solution 2:

I found a different way to fill in missing months (they will be filled with NaN), while also accounting for multiple possible customers.

df = df.set_index(['month_year', 'customer_id'])['sales'].unstack().unstack().reset_index()
df = df.rename(columns={0:'sales'})

While this is absolutley unelegant, it gets the job done.


Post a Comment for "Pandas: Add Data For Missing Months"