Skip to content Skip to sidebar Skip to footer

Python, Nested Loops, Matching And Performance

I am trying to match a list of lastnames to a list of full names using Python 2.7 and the Levenshtein function. To reduce workload I only match if the first letters are identical (

Solution 1:

This simplifies the for loop in the match_string function, but didn't increase the speed noticeably in my tests. The biggest loss is in the two for loops with lastnames and fullnames.

defmatch_strings(lastname, listofnames):
    firstCaseMatched = [name for name in listofnames if lastname[0] == name[0]]
    iflen(firstCaseMatched):
        matchedidx = [index for index, ame inenumerate(firstCaseMatched) if Levenshtein.distance(lastname, name) < 2]
        match = len(matchedidx)
    else:
        match = 0if match == 1:
        newnamelist = [i for j, i inenumerate(listofnames) if j notin matchedidx]
        return1, newnamelist
    return0, listofnames

You might have to sort the list of known last names, split them into a dict for each starting character. And then match each name in the list of names against that.

Assuming the fullnames list always has the first name as first element. You could limit the comparison to only the other elements.

Post a Comment for "Python, Nested Loops, Matching And Performance"