Skip to content Skip to sidebar Skip to footer

Python 3, Differences Between Two Strings

I'd like to record the location of differences from both strings in a list (to remove them) ... preferably recording the highest separation point for each section, as these areas w

Solution 1:

Using difflib is probably your best bet as you are unlikely to come up with a more efficient solution than the algorithms it provides. What you want is to use SequenceMatcher.get_matching_blocks. Here is what it will output according to the doc.

Return list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and j.

Here is a way you could use this to reconstruct a string from which you removed the delta.

from difflib import SequenceMatcher

x = "abc_def"
y = "abc--ef"

matcher = SequenceMatcher(None, x, y)
blocks = matcher.get_matching_blocks()

# blocks: [Match(a=0, b=0, size=4), Match(a=5, b=5, size=2), Match(a=7, b=7, size=0)]

string = ''.join([x[a:a+n] for a, _, n in blocks])

# string: "abcef"

Edit: It was also pointed out that in a case where you had two strings like such.

t1 = 'WordWordaayaaWordWord'
t2 = 'WordWordbbbybWordWord'

Then the above code would return 'WordWordyWordWord. This is because get_matching_blocks will catch that 'y' that is present in both strings between the expected blocks. A solution around this is to filter the returned blocks by length.

string = ''.join([x[a:a+n] for a, _, n in blocks if n > 1])

If you want more complex analysis of the returned blocks you could also do the following.

def block_filter(substring):
    """Outputs True if the substring is to be merged, False otherwise"""
    ...


string = ''.join([x[a:a+n] for a, _, n in blocks if block_filter(x[a:a+n])])

Post a Comment for "Python 3, Differences Between Two Strings"