Skip to content Skip to sidebar Skip to footer

Find Timestamp In Data Frame Column By A Given Timedelta

I have a dataframe containing a timestamp column. My objective is to find the first timestamp for every row that is greater than the timestamp of the row by a given offset (say 0.0

Solution 1:

I think the fastest and easiest way is using searchsorted. If there is no element greater than df.exchTstamp+delta, then searchsorted will return the length of the array, i.e. an index out of bounds of our dataframe index. Therefore we first need to insert a nan / NaT sentinel row for this case, which we remove afterwards:

import numpy as np, pandas as pd, datetime as dt

df = pd.read_csv('stack.csv', index_col=0, parse_dates=[1])
delta = dt.timedelta(seconds=0.01)

res = np.searchsorted(df.exchTstamp, df.exchTstamp+delta)

# add sentinel
df.append(pd.Series(), ignore_index=True)

df["testTime"] = df.loc[res,'exchTstamp'].values

# remove sentinel
df = df.drop(df.index[-1])

df.to_csv('stack-out.csv')

Solution 2:

Ok, probably not the most elegant way to handle a problem like this, but this will get the job done:

import numpy as np
import pandas as pd

df = pd.read_csv("stack.csv", index_col=0)
df["exchTstamp"] = df["exchTstamp"].apply(pd.to_datetime)

defgetTime(base_idx, offset=0.01):
    time_delta, i = 0, 0while time_delta < offset:
        time_delta = (df["exchTstamp"][base_idx + i] - df["exchTstamp"][base_idx]).total_seconds()
        i += 1if base_idx + i == len(df.index):
            return(np.nan)
    return(df["exchTstamp"][base_idx + i])

df["testTime"] = [getTime(j) for j inrange(len(df.index))]

That then gives you:

df.head(10)exchTstampseqNumrev10mSecAvgprev1SecAvgimbRegimetestTime02019-08-14 09:15:00.022991  1990.0000000.00000002019-08-14 09:15:00.03313612019-08-14 09:15:00.022995  200-0.166667-0.16666732019-08-14 09:15:00.03313622019-08-14 09:15:00.022999  201-0.277778-0.27777822019-08-14 09:15:00.03313632019-08-14 09:15:00.023003  202-0.333333-0.33333322019-08-14 09:15:00.03313642019-08-14 09:15:00.023007  203-0.386667-0.38666722019-08-14 09:15:00.03313652019-08-14 09:15:00.023011  204-0.422222-0.42222202019-08-14 09:15:00.03313662019-08-14 09:15:00.023015  205-0.447619-0.44761902019-08-14 09:15:00.03313672019-08-14 09:15:00.023018  206-0.475000-0.47500002019-08-14 09:15:00.03313682019-08-14 09:15:00.023023  207-0.422222-0.42222212019-08-14 09:15:00.03313692019-08-14 09:15:00.023027  208-0.380000-0.38000032019-08-14 09:15:00.033136

Solution 3:

'Filter' results in empty list at the end of the dataframe. And it is luxury to get all the timestamps greater than the base one as the datas are in cronlogical order.

import numpy as np, pandas as pd, datetime as dt

df=pd.read_csv("stack.csv",parse_dates=[1],index_col=0)

l=[]       

for i in df.index: 
    l.append(None) 
    start=df.at[i,"exchTstamp"] 
    for k inrange(i+1,len(df.index)): 
        if start<=df.at[k,"exchTstamp"]-dt.timedelta(seconds=0.01): 
            l[-1]=df.at[k,"exchTstamp"] 
            break 

df["testTime"]= l 

Post a Comment for "Find Timestamp In Data Frame Column By A Given Timedelta"