Skip to content Skip to sidebar Skip to footer

Efficiently Replace Part Of Value From One Column With Value From Another Column In Pandas Using Regex?

I have a pandas dataframe df with dates as strings: Date1 Date2 2017-08-31 1970-01-01 17:35:00 2017-10-31 1970-01-01 15:00:00 2017-11-30 1970-01-01 16:30:00 2017-10-31

Solution 1:

One idea is:

df['Date3'] =  ['{} {}'.format(a, b.split()[1]) for a, b in zip(df['Date1'], df['Date2'])]

Or:

df['Date3']=df['Date1']+' '+df['Date2'].str.split().str[1]print(df)Date1Date2Date302017-08-31  1970-01-01 17:35:00  2017-08-31 17:35:0012017-10-31  1970-01-01 15:00:00  2017-10-31 15:00:0022017-11-30  1970-01-01 16:30:00  2017-11-30 16:30:0032017-10-31  1970-01-01 16:00:00  2017-10-31 16:00:0042017-10-31  1970-01-01 16:12:00  2017-10-31 16:12:00

Or:

df['Date3']=pd.to_datetime(df['Date1'])+pd.to_timedelta(df['Date2'].str.split().str[1])print(df)Date1Date2Date302017-08-31  1970-01-01 17:35:00 2017-08-31 17:35:0012017-10-31  1970-01-01 15:00:00 2017-10-31 15:00:0022017-11-30  1970-01-01 16:30:00 2017-11-30 16:30:0032017-10-31  1970-01-01 16:00:00 2017-10-31 16:00:0042017-10-31  1970-01-01 16:12:00 2017-10-31 16:12:00

Timings:

In [302]: %timeit df['Date3'] =  ['{} {}'.format(a, b.split()[1]) for a, b in zip(df['Date1'], df['Date2'])]
30.2 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [303]: %timeit df['Date3'] = df['Date1'] + ' ' + df['Date2'].str.split().str[1]
66.4 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Solution 2:

Another way is to

df.Date2 = df.Date1.str[:].values + df.Date2.str[10:].values

df.Date1.str[:].values will get the Date1 field as a numpy array and likewise with Date2 field.

str[10:] is done to extract the time part of Date2 which is appended to the date from Date1.

Timings: 2.26 ms ± 82.2 µs

%timeit df.d2 = df.d1.str[:].values+ df.d2.str[10:].values2.26 ms ± 82.2 µs per loop (mean ± std. dev. of7 runs, 100 loops each)

Post a Comment for "Efficiently Replace Part Of Value From One Column With Value From Another Column In Pandas Using Regex?"