Find Timestamp In Data Frame Column By A Given Timedelta
I have a dataframe containing a timestamp column. My objective is to find the first timestamp for every row that is greater than the timestamp of the row by a given offset (say 0.0
Solution 1:
I think the fastest and easiest way is using searchsorted
. If there is no element greater than df.exchTstamp+delta
, then searchsorted
will return the length of the array, i.e. an index out of bounds of our dataframe index. Therefore we first need to insert a nan
/ NaT
sentinel row for this case, which we remove afterwards:
import numpy as np, pandas as pd, datetime as dt
df = pd.read_csv('stack.csv', index_col=0, parse_dates=[1])
delta = dt.timedelta(seconds=0.01)
res = np.searchsorted(df.exchTstamp, df.exchTstamp+delta)
# add sentinel
df.append(pd.Series(), ignore_index=True)
df["testTime"] = df.loc[res,'exchTstamp'].values
# remove sentinel
df = df.drop(df.index[-1])
df.to_csv('stack-out.csv')
Solution 2:
Ok, probably not the most elegant way to handle a problem like this, but this will get the job done:
import numpy as np
import pandas as pd
df = pd.read_csv("stack.csv", index_col=0)
df["exchTstamp"] = df["exchTstamp"].apply(pd.to_datetime)
defgetTime(base_idx, offset=0.01):
time_delta, i = 0, 0while time_delta < offset:
time_delta = (df["exchTstamp"][base_idx + i] - df["exchTstamp"][base_idx]).total_seconds()
i += 1if base_idx + i == len(df.index):
return(np.nan)
return(df["exchTstamp"][base_idx + i])
df["testTime"] = [getTime(j) for j inrange(len(df.index))]
That then gives you:
df.head(10)exchTstampseqNumrev10mSecAvgprev1SecAvgimbRegimetestTime02019-08-14 09:15:00.022991 1990.0000000.00000002019-08-14 09:15:00.03313612019-08-14 09:15:00.022995 200-0.166667-0.16666732019-08-14 09:15:00.03313622019-08-14 09:15:00.022999 201-0.277778-0.27777822019-08-14 09:15:00.03313632019-08-14 09:15:00.023003 202-0.333333-0.33333322019-08-14 09:15:00.03313642019-08-14 09:15:00.023007 203-0.386667-0.38666722019-08-14 09:15:00.03313652019-08-14 09:15:00.023011 204-0.422222-0.42222202019-08-14 09:15:00.03313662019-08-14 09:15:00.023015 205-0.447619-0.44761902019-08-14 09:15:00.03313672019-08-14 09:15:00.023018 206-0.475000-0.47500002019-08-14 09:15:00.03313682019-08-14 09:15:00.023023 207-0.422222-0.42222212019-08-14 09:15:00.03313692019-08-14 09:15:00.023027 208-0.380000-0.38000032019-08-14 09:15:00.033136
Solution 3:
'Filter' results in empty list at the end of the dataframe. And it is luxury to get all the timestamps greater than the base one as the datas are in cronlogical order.
import numpy as np, pandas as pd, datetime as dt
df=pd.read_csv("stack.csv",parse_dates=[1],index_col=0)
l=[]
for i in df.index:
l.append(None)
start=df.at[i,"exchTstamp"]
for k inrange(i+1,len(df.index)):
if start<=df.at[k,"exchTstamp"]-dt.timedelta(seconds=0.01):
l[-1]=df.at[k,"exchTstamp"]
break
df["testTime"]= l
Post a Comment for "Find Timestamp In Data Frame Column By A Given Timedelta"