Skip to content Skip to sidebar Skip to footer

How To Efficiently Process Time-series Data In Pandas

I have data sets representing travel times past given nodes. The data is in one CSV file per node in this format: node name, datetime, irrelevant field, mac address I'm reading the

Solution 1:

if the dataframe is sorted by datetime for each mac address, probably you can do:

grb = df.groupby('mac address')
df['origin'] = grb['node name'].transform(pd.Series.shift, 1)
df['departure time'] = grb['datetime'].transform(pd.Series.shift, 1)

and the travel time would be:

df['travel time'] = df['departure time'] - df['datetime']

and if node names are string, the path would be:

df['path'] = df['origin'] + '-' + df['node name']

edit: this may be faster assuming travel times cannot be negative:

df.sort(['mac address', 'datetime'], inplace=True)

df['origin'] = df['node name'].shift(1)
df['departure time'] = df['datetime'].shift(1)

# correct for the places where the mac addresses change
idx = df['mac address'] != df['mac address'].shift(1)
df.loc[idx, 'origin'] = np.nan
df.loc[idx, 'departure time'] = np.nan

Post a Comment for "How To Efficiently Process Time-series Data In Pandas"