Comparing List Of Unique Objects With Custom Function

March 09, 2024 Post a Comment

I need to compare hundreds of objects stored in a unique list to find duplicates: object_list = {Object_01, Object_02, Object_03, Object_04, Object_05, ...} I've written a custom

Solution 1:

You don't need to calculate all combinations, you just need to check if a given item is a duplicate:

for i, a in enumerate(x):
    if any(a.compare(b) for b in x[:i]):
        # a is a duplicate of an already seen item, so do something

This is still technically O(n^2), but you've cut out at least half the checks required, and should be a bit faster.

In short, x[:i] returns all items in the list before index i. If the item x[i] appears in that list, you know it's a duplicate. If not, there may be a duplicate after it in the list, but you worry about that when you get there.

Using any is also important here: if it finds any true item, it will immediately stop, without checking the rest of the iterable.

You could also improve the number of checks by removing known duplicates from the list you're checking against:

x_copy = x[:]
removed = 0for i, a inenumerate(x):
    ifany(a.compare(b) for b in x_copy[:i-removed]):
        del x_copy[i-removed]
        removed += 1# a is a duplicate of an already seen item, so do something

Note that we use a copy, to avoid changing the sequence we're iterating over, and we need to take account for the number of items we've removed when using indexes.

Next, we just need to figure out how to build the dictionary.

THis might be a little more complex. The first step is to figure out exactly which element is a duplicate. This can be done by realising any is just a wrapper around a for loop:

def any(iterable):
    for item in iterable:
        if item: returnTruereturnFalse

We can then make a minor change, and pass in a function:

deffirst(iterable, fn):
    for item in iterable:
        if fn(item): return item     
    returnNone

Now, we change our duplicate finder as follows:

d = collections.defaultdict(list)

x_copy = x[:]
removed = 0for i, a inenumerate(x):
    b = first(x_copy[:i-removed], a.compare):
    if b isnotNone:
        # b is the first occurring duplicate of adel x_copy[i-removed]
        removed += 1

        d[b.name].append(a)

     else:
         # we've not seen a yet, but might see it later
         d[a.name].append(a)

This will put every element in the list into a dict(-like). If you only want the duplicates, it's then just a case of getting all the entries with a length greater than 1.

Solution 2:

Group the objects by name if you want to find the dups grouping by attributes

classFoo:
    def__init__(self,i,j):
        self.i = i
        self.j = j


object_list = {Foo(1,2),Foo(3,4),Foo(1,2),Foo(3,4),Foo(5,6)}

from collections import defaultdict

d = defaultdict(list)

for obj in object_list:
    d[(obj.i,obj.j)].append(obj)

print(d)

defaultdict(<type'list'>, {(1, 2): [<__main__.Foo instance at 0x7fa44ee7d098>, <__main__.Foo instance at 0x7fa44ee7d128>], 
(5, 6): [<__main__.Foo instance at 0x7fa44ee7d1b8>], 
(3, 4): [<__main__.Foo instance at 0x7fa44ee7d0e0>, <__main__.Foo instance at 0x7fa44ee7d170>]})

If not the name then use a tuple to store all the attributes you use to check for comparison.

Or sort the list by the attributes that matter and use groupby to group:

classFoo:
    def__init__(self,i,j):
        self.i = i
        self.j = j
object_list = {Foo(1,2),Foo(3,4),Foo(1,2),Foo(3,4),Foo(5,6)}

from itertools import groupby
from operator import attrgetter
groups = [list(v) for k,v in groupby(sorted(object_list, key=attrgetter("i","j")),key=attrgetter("i","j"))]

print(groups)

[[<__main__.Foo instance at 0x7f794a944d40>, <__main__.Foo instance at 0x7f794a944dd0>], [<__main__.Foo instance at 0x7f794a944d88>, <__main__.Foo instance at 0x7f794a944e18>], [<__main__.Foo instance at 0x7f794a944e60>]]

You could also implement lt, eq and hash to make your objects sortable and hashable:

classFoo(object):
    def__init__(self,i,j):
        self.i = i
        self.j = j

    def__lt__(self, other):
        return (self.i, self.j) < (other.i, other.j)


    def__hash__(self):
        returnhash((self.i,self.j))

    def__eq__(self, other):
        return (self.i, self.j) == (other.i, other.j)


print(set(object_list))

object_list.sort()
print(map(lambda x: (getattr(x,"i"),getattr(x,"j")),object_list))
set([<__main__.Foo object at 0x7fdff2fc08d0>, <__main__.Foo object at 0x7fdff2fc09d0>, <__main__.Foo object at 0x7fdff2fc0810>])
[(1, 2), (1, 2), (3, 4), (3, 4), (5, 6)]

Obviously the attributes need to be hashable, if you had lists you could change to tuples etc..

Python Playground

Comparing List Of Unique Objects With Custom Function

Solution 1:

Solution 2:

Post a Comment for "Comparing List Of Unique Objects With Custom Function"