Add Extra Column As The Cumulative Time Difference
How to add an extra column that is the cumulative value of the time differences for each course? For example, the initial table is: id_A course weight ts_
Solution 1:
You can chain the diff
method with cumsum
:
# convert ts_A to datetime type
df.ts_A = pd.to_datetime(df.ts_A)
# convert ts_A to seconds, group by id and then use transform to calculate the cumulative differencedf['cum_delta_sec'] = df.ts_A.astype(int).div(10**9).groupby(df.id_A).transform(lambda x: x.diff().fillna(0).cumsum())
df
Solution 2:
Use groupby
, transform
, and .iloc
:
df['ts_A'] = pd.to_datetime(df.ts_A)
df['cum_delta_sec'] = (df.groupby('id_A')['ts_A']
.transform(lambda x: (x - x.iloc[0]).dt.total_seconds()))
Output:
id_Acourseweightts_Avaluecum_delta_sec0id1cotton3.52017-04-27 01:35:30 150.00000001id1cotton3.52017-04-27 01:36:00 416.666667302id1cotton3.52017-04-27 01:36:30 700.000000603id1cotton3.52017-04-27 01:37:00 950.000000904id2cottonblue5.02017-04-27 02:35:30 150.00000005id2cottonblue5.02017-04-27 02:36:00 450.000000306id2cottonblue5.02017-04-27 02:36:30 520.666667607id2cottonblue5.02017-04-27 02:37:00 610.00000090
In the group, subtract current value from the first value and use .dt
accessor to convert to seconds.
Solution 3:
import csv
import datetime as dt
with open('path/to/input') as fin, open('path/to/output', 'w') as fout:
infile = csv.DictReader(fin, delimiter='\t')
outfile = csv.DictWriter(fout, delimiter='\t', fieldnames=infile.fieldnames + ['cum_delta_sec'])
cdt = 0last = None
for row in infile:
iflast is None:
last = dt.strptime(row['ts_A'], "%Y-%m-%d %H:%M:%S")
row['cum_delta_sec'] = 0
outfile.writerow(row)
continue
cdt += (last - dt.strptime(row['ts_A'], "%Y-%m-%d %H:%M:%S")).total_seconds()
row['cum_delta_sec'] = cdt
outfile.writerow(row)
Post a Comment for "Add Extra Column As The Cumulative Time Difference"