Skip to content Skip to sidebar Skip to footer

Fast Unique Combinations (from List With Duplicates) Without Lookups

I seems that in spite of the fact that there are online plenty of algorithms and functions for generating unique combinations of any size from a list of unique items, there is none

Solution 1:

Instead of post-processing/filtering your output, you can pre-process your input list. This way, you can avoid generating duplicates in the first place. Pre-processing involves either sorting (or using a collections.Counter on) the input. One possible recursive realization is:

defsubbags(bag, k):
    a = sorted(bag)
    n = len(a)
    sub = []

    defindex_of_next_unique_item(i):
        j = i + 1while j < n and a[j] == a[i]:
            j += 1return j

    defcombinate(i):
        iflen(sub) == k:
            yieldtuple(sub)
        elif n - i >= k - len(sub):
            sub.append(a[i])
            yieldfrom combinate(i + 1)
            sub.pop()
            yieldfrom combinate(index_of_next_unique_item(i))

    yieldfrom combinate(0)

bag = [1, 2, 3, 1, 2, 1]
k = 3
i = -1print(sorted(bag), k)
print('---')

for i, subbag inenumerate(subbags(bag, k)):
    print(subbag)

print('---')
print(i + 1)

Output:

[1, 1, 1, 2, 2, 3] 3---(1,1,1)(1,1,2)(1,1,3)(1,2,2)(1,2,3)(2,2,3)---6

Requires some stack space for the recursion, but this + sorting the input should use substantially less time + memory than generating and discarding repeats.

Solution 2:

The current state-of-the-art inspired initially by a 50 than by a 100 reps bounties is at the moment (instead of a Python extension module written entirely in C):

An efficient algorithm and implementation that is better than the obvious (set + combinations) approach in the best (and average) case, and is competitive with it in the worst case.

It seems to be possible to fulfill this requirement using a kind of "fake it before you make it" approach. The current state-of-the-art is that there are two generator function algorithms available for solving the problem of getting unique combinations in case of a non-unique list. The below provided algorithm combines both of them what becomes possible because it seems to exist a threshold value for percentage of unique items in the list which can be used for appropriate switching between the two algorithms. The calculation of the percentage of uniqueness is done with so tiny amount of computation time that it even doesn't clearly show up in the final results due to common variation of the taken timing.

defiterFastUniqueCombos(lstList, comboSize, percUniqueThresh=60):

    lstListSorted = sorted(lstList)
    lenListSorted = len(lstListSorted)

    percUnique = 100.0 - 100.0*(lenListSorted-len(set(lstListSorted)))/lenListSorted

    lstComboCandidate = []
    setUniqueCombos = set()

    defidxNextUnique(idxItemOfList):
        idxNextUniqueCandidate = idxItemOfList + 1while (
                idxNextUniqueCandidate < lenListSorted 
                    and 
                lstListSorted[idxNextUniqueCandidate] == lstListSorted[idxItemOfList]
        ): # while
            idxNextUniqueCandidate += 1
        idxNextUnique = idxNextUniqueCandidate
        return idxNextUnique

    defcombinate(idxItemOfList):
        iflen(lstComboCandidate) == sizeOfCombo:
            yieldtuple(lstComboCandidate)
        elif lenListSorted - idxItemOfList >= sizeOfCombo - len(lstComboCandidate):
            lstComboCandidate.append(lstListSorted[idxItemOfList])
            yieldfrom combinate(idxItemOfList + 1)
            lstComboCandidate.pop()
            yieldfrom combinate(idxNextUnique(idxItemOfList))

    if percUnique > percUniqueThresh:
        from itertools import combinations
        allCombos = combinations(lstListSorted, comboSize)
        for comboCandidate in allCombos:
            if comboCandidate in setUniqueCombos:
                continueyield comboCandidate
            setUniqueCombos.add(comboCandidate)
    else:
        yieldfrom combinate(0)
    #:if/else    #:def iterFastUniqueCombos()

The below provided timings show that the above iterFastUniqueCombos() generator function provides a clear advantage over uniqueCombinations() variant in case the list has less than 60 percent of unique elements and is not worse as the on (set + combinations) based uniqueCombinations() generator function in the opposite case where it gets much faster than the iterUniqueCombos() one (due to switching between the (set + combinations) and the (no lookups) variant at 60% threshold for amount of unique elements in the list):

===========sizeOfCombo: 6   sizeOfList:48noOfUniqueInList1percUnique2Combos:12271512print(len(list(combinations(lst,k)))):2.04968seconds.Combos:1print(len(list(iterUniqueCombos(lst,k)))):0.00011seconds.Combos:1print(len(list(iterFastUniqueCombos(lst,k)))):0.00008seconds.Combos:1print(len(list(uniqueCombinations(lst,k)))):3.61812seconds.==========sizeOfCombo: 6   sizeOfList:48noOfUniqueInList48percUnique100Combos:12271512print(len(list(combinations(lst,k)))):1.99383seconds.Combos:12271512print(len(list(iterUniqueCombos(lst,k)))):49.72461seconds.Combos:12271512print(len(list(iterFastUniqueCombos(lst,k)))):8.07997seconds.Combos:12271512print(len(list(uniqueCombinations(lst,k)))):8.11974seconds.==========sizeOfCombo: 6   sizeOfList:48noOfUniqueInList27percUnique56Combos:12271512print(len(list(combinations(lst,k)))):2.02774seconds.Combos:534704print(len(list(iterUniqueCombos(lst,k)))):1.60052seconds.Combos:534704print(len(list(iterFastUniqueCombos(lst,k)))):1.62002seconds.Combos:534704print(len(list(uniqueCombinations(lst,k)))):3.41156seconds.==========sizeOfCombo: 6   sizeOfList:48noOfUniqueInList31percUnique64Combos:12271512print(len(list(combinations(lst,k)))):2.03539seconds.Combos:1114062print(len(list(iterUniqueCombos(lst,k)))):3.49330seconds.Combos:1114062print(len(list(iterFastUniqueCombos(lst,k)))):3.64474seconds.Combos:1114062print(len(list(uniqueCombinations(lst,k)))):3.61857seconds.

Post a Comment for "Fast Unique Combinations (from List With Duplicates) Without Lookups"