Skip to content Skip to sidebar Skip to footer

Python Pandas: How To Merge Based On An "or" Condition?

Let's say I have two dataframes, and the column names for both are: table 1 columns: [ShipNumber, TrackNumber, ShipDate, Quantity, Weight] table 2 columns: [ShipNumber, TrackNumber

Solution 1:

Use merge() and concat(). Then drop any duplicate cases where both A and B match (thanks @Scott Boston for that final step).

df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})

df1         df2
   A  B        A  B
0  3  7     0  1  4
1  2  8     1  5  1
2  1  9     2  6  8
3  4  5     3  4  5

With these data frames we should see:

  • df1.loc[0] matches A on df2.loc[0]
  • df1.loc[1] matches B on df2.loc[2]
  • df1.loc[3] matches both A and B on df2.loc[3]

We'll use suffixes to keep track of what matched where:

suff_A = ['_on_A_match_1', '_on_A_match_2']
suff_B = ['_on_B_match_1', '_on_B_match_2']

df = pd.concat([df1.merge(df2, on='A', suffixes=suff_A), 
                df1.merge(df2, on='B', suffixes=suff_B)])

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
01.0             NaN             NaN  NaN             9.04.014.0             NaN             NaN  NaN             5.05.00  NaN             2.06.08.0             NaN             NaN
1  NaN             4.04.05.0             NaN             NaN

Note that the second and fourth rows are duplicate matches (for both data frames, A = 4 and B = 5). We need to remove one of those sets.

dups =(df.B_on_A_match_1 == df.B_on_A_match_2)# also could remove A_on_B_match
df.loc[~dups]

     A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
01.0NaNNaNNaN9.04.00NaN2.06.08.0NaNNaN1NaN4.04.05.0NaNNaN

Solution 2:

I would suggest this alternate way for doing merge like this. This seems easier for me.

table1["id_to_be_merged"] = table1.apply(
    lambda row: row["ShipNumber"] if pd.notnull(row["ShipNumber"]) elserow["TrackNumber"], axis=1)

You can add the same column in table2 as well if needed and then use in left_in or right_on based on your requirement.

Post a Comment for "Python Pandas: How To Merge Based On An "or" Condition?"