Return Similarity Matrix From Two Variable-length Arrays Of Strings (scipy Option?)
Say I have two arrays: import numpy as np arr1 = np.array(['faucet', 'faucets', 'bath', 'parts', 'bathroom']) arr2 = np.array(['faucett', 'faucetd', 'bth', 'kichen']) and I want t
Solution 1:
I think you're looking for cdist
:
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
from Levenshtein import ratio
arr1 = np.array(['faucet', 'faucets', 'bath', 'parts', 'bathroom'])
arr2 = np.array(['faucett', 'faucetd', 'bth', 'kichen'])
matrix = cdist(arr2.reshape(-1, 1), arr1.reshape(-1, 1), lambda x, y: ratio(x[0], y[0]))
df = pd.DataFrame(data=matrix, index=arr2, columns=arr1)
Result:
faucet faucets bath parts bathroom
faucett 0.923077 0.857143 0.363636 0.333333 0.266667
faucetd 0.923077 0.857143 0.363636 0.333333 0.266667
bth 0.222222 0.200000 0.857143 0.250000 0.545455
kichen 0.333333 0.307692 0.200000 0.000000 0.142857
Post a Comment for "Return Similarity Matrix From Two Variable-length Arrays Of Strings (scipy Option?)"