Return Similarity Matrix From Two Variable-length Arrays Of Strings (scipy Option?)

March 11, 2024 Post a Comment

Say I have two arrays: import numpy as np arr1 = np.array(['faucet', 'faucets', 'bath', 'parts', 'bathroom']) arr2 = np.array(['faucett', 'faucetd', 'bth', 'kichen']) and I want t

Solution 1:

I think you're looking for cdist:

import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
from Levenshtein import ratio

arr1 = np.array(['faucet', 'faucets', 'bath', 'parts', 'bathroom'])
arr2 = np.array(['faucett', 'faucetd', 'bth', 'kichen'])

matrix = cdist(arr2.reshape(-1, 1), arr1.reshape(-1, 1), lambda x, y: ratio(x[0], y[0]))
df = pd.DataFrame(data=matrix, index=arr2, columns=arr1)

Result:

           faucet   faucets      bath     parts  bathroom
faucett  0.923077  0.857143  0.363636  0.333333  0.266667
faucetd  0.923077  0.857143  0.363636  0.333333  0.266667
bth      0.222222  0.200000  0.857143  0.250000  0.545455
kichen   0.333333  0.307692  0.200000  0.000000  0.142857

Python Playground

Return Similarity Matrix From Two Variable-length Arrays Of Strings (scipy Option?)

Solution 1:

Post a Comment for "Return Similarity Matrix From Two Variable-length Arrays Of Strings (scipy Option?)"