How To Efficiently Process Time-series Data In Pandas
I have data sets representing travel times past given nodes. The data is in one CSV file per node in this format: node name, datetime, irrelevant field, mac address I'm reading the
Solution 1:
if the dataframe is sorted by datetime for each mac address
, probably you can do:
grb = df.groupby('mac address')
df['origin'] = grb['node name'].transform(pd.Series.shift, 1)
df['departure time'] = grb['datetime'].transform(pd.Series.shift, 1)
and the travel time would be:
df['travel time'] = df['departure time'] - df['datetime']
and if node names are string, the path would be:
df['path'] = df['origin'] + '-' + df['node name']
edit: this may be faster assuming travel times cannot be negative:
df.sort(['mac address', 'datetime'], inplace=True)
df['origin'] = df['node name'].shift(1)
df['departure time'] = df['datetime'].shift(1)
# correct for the places where the mac addresses change
idx = df['mac address'] != df['mac address'].shift(1)
df.loc[idx, 'origin'] = np.nan
df.loc[idx, 'departure time'] = np.nan
Post a Comment for "How To Efficiently Process Time-series Data In Pandas"