Using Fuzzywuzzy To Create A Column Of Matched Results In The Data Frame
I'm running into a challenge with using the FuzzyWuzzy library to store all my results in a data frame column (I'm guessing it might require a loop?) I've been scratching my head o
Solution 1:
It's not a good idea to store lists in DataFrame, I suggest store every match as a row in DataFrame. Here is the code:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import pandas as pd
import io
master = pd.read_csv(io.StringIO("""ID,ITEM
1,Pepperoni Pizza
2,Cheese Pizza
3,Chicken Salad
4,Plain Salad"""))
lookups = ["Cheese", "Salad"]
choices = master.set_index("ID").ITEM.to_dict()
res = [(lookup,) + item for lookup in lookups for item in process.extract(lookup, choices,limit=2)]
df = pd.DataFrame(res, columns=["lookup", "matched", "score", "id"])
df
output:
lookup matched score id
0 Cheese Cheese Pizza 90 2
1 Cheese Chicken Salad 45 3
2 Salad Chicken Salad 90 3
3 Salad Plain Salad 90 4
Basically, I create a choices
dict from master
for match and then for loop the lookups
and store the result as a list. And convert the list to DataFrame finally.
Post a Comment for "Using Fuzzywuzzy To Create A Column Of Matched Results In The Data Frame"