Skip to content Skip to sidebar Skip to footer

Count Number Of Rows For Each ID Within 1 Year

I have a pandas dataframe something like this Date ID 01/01/2016 a 05/01/2016 a 10/05/2017 a 05/05/2014 b 07/09/2014 b 12/08/2017 b What I need to do is to add a colu

Solution 1:

I think you need between with boolean indexing for filter first and then groupby and aggregate size.

Outputs are concated and add reindex for add missing rows filled by 0:

print (df)
         Date ID
0  01/01/2016  a
1  05/01/2016  a
2  10/05/2017  a
3  05/05/2018  b
4  07/09/2014  b
5  07/09/2014  c
6  12/08/2018  b

#convert to datetime (if first number is day, add parameter dayfirst)
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
now = pd.datetime.today()
print (now)

oneyarbeforenow =  now - pd.offsets.DateOffset(years=1)
oneyarafternow =  now + pd.offsets.DateOffset(years=1)

#first filter
a = df[df['Date'].between(oneyarbeforenow, now)].groupby('ID').size()
b = df[df['Date'].between(now, oneyarafternow)].groupby('ID').size()
print (a)
ID
a    1
dtype: int64

print (b)
ID
b    2
dtype: int64

df1 = pd.concat([a,b],axis=1).fillna(0).astype(int).reindex(df['ID'].unique(),fill_value=0)
print (df1)
   0  1
a  1  0
b  0  2
c  0  0

EDIT:

If need compare each date by first date add or subtract year offset per group need custom function with condition and sum Trues:

offs = pd.offsets.DateOffset(years=1)

f = lambda x: pd.Series([(x > x.iat[-1] - offs).sum(), \
                        (x < x.iat[-1] + offs).sum()], index=['last','next'])
df = df.groupby('ID')['Date'].apply(f).unstack(fill_value=0).reset_index()
print (df)
  ID  last  next
0  a     1     3
1  b     3     2
2  c     1     1

Solution 2:

In [19]: x['date'] = pd.to_datetime( x['date']) # convert string date to datetime pd object
In [20]: x['date'] = x['date'].dt.year # get year from the date

In [21]: x
Out[21]: 
   date id
0  2016  a
1  2016  a
2  2017  a
3  2014  b
4  2014  b
5  2017  b


In [27]: x.groupby(['date','id']).size() # group by both columns
Out[27]: 
date  id
2014  b     2
2016  a     2
2017  a     1
      b     1

Solution 3:

Using resample takes care of missing inbetween years. See. year-2015

In [550]: df.set_index('Date').groupby('ID').resample('Y').size().unstack(fill_value=0)
Out[550]:
Date  2014-12-31  2015-12-31  2016-12-31  2017-12-31
ID
a              0           0           2           1
b              2           0           0           1

Use rename if you want only year in columns

In [551]: (df.set_index('Date').groupby('ID').resample('Y').size().unstack(fill_value=0)
             .rename(columns=lambda x: x.year))
Out[551]:
Date  2014  2015  2016  2017
ID
a        0     0     2     1
b        2     0     0     1

Post a Comment for "Count Number Of Rows For Each ID Within 1 Year"