Skip to content Skip to sidebar Skip to footer

Merge Data From Multiple Data Frames On Multiple Conditions

I want to merge multiple dataframes, but only if the keys match and the date range falls within 90 days of the 'InitialAdmit' date range in df1. I want to keep all rows from df1 an

Solution 1:

I still recommend merge then filter , here we using Boolean index and combine_first

df=df1.merge(df2,on='Key')m=(df.InitialAdmit_y>=df.InitialAdmit_x)&(df.InitialAdmit_y<=df.InitialAdmit_x)df1.set_index('Key').combine_first(df[m].set_index('Key'))Out[215]:90DayRangeInitialAdmitInitialAdmit_xInitialAdmit_yKey1000002042012-09-02   2012-06-04            NaTNaT1000002552012-08-01   2012-05-03     2012-05-03     2012-06-031000002712012-04-15   2012-01-16            NaTNaT1000002862013-01-24   2012-10-26     2012-10-26     2012-11-261000006282012-05-21   2012-02-21            NaTNaT

Solution 2:

Consider reduce for the chain merge using a left join. Below demonstrates with 3 copies of df2. Also, below assumes InitialAdmit is the last column of the dataframe. Reorder as needed.

import pandas 
import numpy
from functools import reduce    
...

# LIST OF DATAFRAMES WITH SUFFIXING OF INITIALADMIT TO AVOID NAME COLLISION
dfList = [d.rename(columns={'InitialAdmit':'InitialAdmit_' + str(i)}) 
          for i,d  inenumerate([df1, df2, df2, df2])]

# USER-DEFINED METHOD CONDITIONING ON LAST COLUMNdefmergefilter(x, y):
    tmp = pandas.merge(x, y, on='Key', how='left')
    tmp.loc[~(tmp.iloc[:, -1].between(tmp['InitialAdmit_0'], tmp['90DayRange'])), 
            tmp.columns[-1]] = numpy.nan

    return tmp

finaldf = reduce(mergefilter, dfList)

print(finaldf)
#    90DayRange InitialAdmit_0        Key InitialAdmit_1 InitialAdmit_2 InitialAdmit_3# 0  2012-09-02     2012-06-04  100000204            NaN            NaN            NaN# 1  2012-08-01     2012-05-03  100000255     2012-06-03     2012-06-03     2012-06-03# 2  2012-04-15     2012-01-16  100000271            NaN            NaN            NaN# 3  2013-01-24     2012-10-26  100000286     2012-11-26     2012-11-26     2012-11-26# 4  2012-05-21     2012-02-21  100000628            NaN            NaN            NaN

Post a Comment for "Merge Data From Multiple Data Frames On Multiple Conditions"