Merge Data From Multiple Data Frames On Multiple Conditions
I want to merge multiple dataframes, but only if the keys match and the date range falls within 90 days of the 'InitialAdmit' date range in df1. I want to keep all rows from df1 an
Solution 1:
I still recommend merge then filter , here we using Boolean index and combine_first
df=df1.merge(df2,on='Key')m=(df.InitialAdmit_y>=df.InitialAdmit_x)&(df.InitialAdmit_y<=df.InitialAdmit_x)df1.set_index('Key').combine_first(df[m].set_index('Key'))Out[215]:90DayRangeInitialAdmitInitialAdmit_xInitialAdmit_yKey1000002042012-09-02 2012-06-04 NaTNaT1000002552012-08-01 2012-05-03 2012-05-03 2012-06-031000002712012-04-15 2012-01-16 NaTNaT1000002862013-01-24 2012-10-26 2012-10-26 2012-11-261000006282012-05-21 2012-02-21 NaTNaT
Solution 2:
Consider reduce
for the chain merge using a left join. Below demonstrates with 3 copies of df2. Also, below assumes InitialAdmit is the last column of the dataframe. Reorder as needed.
import pandas
import numpy
from functools import reduce
...
# LIST OF DATAFRAMES WITH SUFFIXING OF INITIALADMIT TO AVOID NAME COLLISION
dfList = [d.rename(columns={'InitialAdmit':'InitialAdmit_' + str(i)})
for i,d inenumerate([df1, df2, df2, df2])]
# USER-DEFINED METHOD CONDITIONING ON LAST COLUMNdefmergefilter(x, y):
tmp = pandas.merge(x, y, on='Key', how='left')
tmp.loc[~(tmp.iloc[:, -1].between(tmp['InitialAdmit_0'], tmp['90DayRange'])),
tmp.columns[-1]] = numpy.nan
return tmp
finaldf = reduce(mergefilter, dfList)
print(finaldf)
# 90DayRange InitialAdmit_0 Key InitialAdmit_1 InitialAdmit_2 InitialAdmit_3# 0 2012-09-02 2012-06-04 100000204 NaN NaN NaN# 1 2012-08-01 2012-05-03 100000255 2012-06-03 2012-06-03 2012-06-03# 2 2012-04-15 2012-01-16 100000271 NaN NaN NaN# 3 2013-01-24 2012-10-26 100000286 2012-11-26 2012-11-26 2012-11-26# 4 2012-05-21 2012-02-21 100000628 NaN NaN NaN
Post a Comment for "Merge Data From Multiple Data Frames On Multiple Conditions"