In-group Time-to Event Counter
Solution 1:
To get the days until the next event, we can add a column that backfills the date of the next event:
df['next_event'] = df['date'][df['is_event'] == 1]
df['next_event'] = df.groupby('id')['next_event'].transform(lambda x: x.fillna(method='bfill'))
We can then just subtract to get the days between the next event and each day:
df['next_event'] = df['next_event'].fillna(df['date'].iloc[-1] + pd.Timedelta(days=1))
df['time_to_next_event'] = (df['next_event']-df['date']).dt.days
To get the is_censored value for each day and each id, we can group by id, and then we can forward-fill based on the 'is_event' column for each group. Now, we just need the forward-filled values, since according to the definition above, the value of 'is_censored' should be 0 on the day of the event itself. So, we can compare the 'is_event' column to the forward-filled version of that column and set 'is_censored' to 1 each time we have a forward-filled value that wasn't in the original.
df['is_censored'] = (df.groupby('id')['is_event'].transform(lambda x: x.replace(0, method='ffill')) != df['is_event']).astype(int)
df = df.drop('next_event', axis=1)
In [343]: df
Out[343]:
iddate is_event time_to_next_event is_censored
0 a 2017-01-01 0 3 0
1 a 2017-01-02 0 2 0
2 a 2017-01-03 0 1 0
3 a 2017-01-04 1 0 0
4 a 2017-01-05 1 0 0
5 b 2017-01-01 0 1 0
6 b 2017-01-02 1 0 0
7 b 2017-01-03 0 3 1
8 b 2017-01-04 0 2 1
9 b 2017-01-05 0 1 1
Solution 2:
To generalize the method for is_censored
to include cases where an event happens more than once within each id
, I wrote this:
df['is_censored2'] = 1
max_dates = df[df['is_event'] == 1].groupby('id',as_index=False)['date'].max()
max_dates.columns = ['id','max_date']
df = pd.merge(df,max_dates,on=['id'],how='left')
df['is_censored2'][df['date'] <= df['max_date']] = 0
It initializes the column at 1 then grabs the max date associated with an event within each id
and populates a 0 in is_censored2
if there are any dates in id
that are less than or equal to it.
Post a Comment for "In-group Time-to Event Counter"