Skip to content Skip to sidebar Skip to footer

Using Fuzzywuzzy To Create A Column Of Matched Results In The Data Frame

I'm running into a challenge with using the FuzzyWuzzy library to store all my results in a data frame column (I'm guessing it might require a loop?) I've been scratching my head o

Solution 1:

It's not a good idea to store lists in DataFrame, I suggest store every match as a row in DataFrame. Here is the code:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

import pandas as pd
import io

master = pd.read_csv(io.StringIO("""ID,ITEM
1,Pepperoni Pizza
2,Cheese Pizza
3,Chicken Salad
4,Plain Salad"""))

lookups = ["Cheese", "Salad"]

choices = master.set_index("ID").ITEM.to_dict()

res = [(lookup,) + item for lookup in lookups for item in process.extract(lookup, choices,limit=2)]
df = pd.DataFrame(res, columns=["lookup", "matched", "score", "id"])
df

output:

   lookup        matched  score  id
0  Cheese   Cheese Pizza     90   2
1  Cheese  Chicken Salad     45   3
2   Salad  Chicken Salad     90   3
3   Salad    Plain Salad     90   4

Basically, I create a choices dict from master for match and then for loop the lookups and store the result as a list. And convert the list to DataFrame finally.

Post a Comment for "Using Fuzzywuzzy To Create A Column Of Matched Results In The Data Frame"