Fast Unique Combinations (from List With Duplicates) Without Lookups
Solution 1:
Instead of post-processing/filtering your output, you can pre-process your input list. This way, you can avoid generating duplicates in the first place. Pre-processing involves either sorting (or using a collections.Counter
on) the input. One possible recursive realization is:
defsubbags(bag, k):
a = sorted(bag)
n = len(a)
sub = []
defindex_of_next_unique_item(i):
j = i + 1while j < n and a[j] == a[i]:
j += 1return j
defcombinate(i):
iflen(sub) == k:
yieldtuple(sub)
elif n - i >= k - len(sub):
sub.append(a[i])
yieldfrom combinate(i + 1)
sub.pop()
yieldfrom combinate(index_of_next_unique_item(i))
yieldfrom combinate(0)
bag = [1, 2, 3, 1, 2, 1]
k = 3
i = -1print(sorted(bag), k)
print('---')
for i, subbag inenumerate(subbags(bag, k)):
print(subbag)
print('---')
print(i + 1)
Output:
[1, 1, 1, 2, 2, 3] 3---(1,1,1)(1,1,2)(1,1,3)(1,2,2)(1,2,3)(2,2,3)---6
Requires some stack space for the recursion, but this + sorting the input should use substantially less time + memory than generating and discarding repeats.
Solution 2:
The current state-of-the-art inspired initially by a 50 than by a 100 reps bounties is at the moment (instead of a Python extension module written entirely in C):
An efficient algorithm and implementation that is better than the obvious
(set + combinations)
approach in the best (and average) case, and is competitive with it in the worst case.
It seems to be possible to fulfill this requirement using a kind of "fake it before you make it" approach. The current state-of-the-art is that there are two generator function algorithms available for solving the problem of getting unique combinations in case of a non-unique list. The below provided algorithm combines both of them what becomes possible because it seems to exist a threshold value for percentage of unique items in the list which can be used for appropriate switching between the two algorithms. The calculation of the percentage of uniqueness is done with so tiny amount of computation time that it even doesn't clearly show up in the final results due to common variation of the taken timing.
defiterFastUniqueCombos(lstList, comboSize, percUniqueThresh=60):
lstListSorted = sorted(lstList)
lenListSorted = len(lstListSorted)
percUnique = 100.0 - 100.0*(lenListSorted-len(set(lstListSorted)))/lenListSorted
lstComboCandidate = []
setUniqueCombos = set()
defidxNextUnique(idxItemOfList):
idxNextUniqueCandidate = idxItemOfList + 1while (
idxNextUniqueCandidate < lenListSorted
and
lstListSorted[idxNextUniqueCandidate] == lstListSorted[idxItemOfList]
): # while
idxNextUniqueCandidate += 1
idxNextUnique = idxNextUniqueCandidate
return idxNextUnique
defcombinate(idxItemOfList):
iflen(lstComboCandidate) == sizeOfCombo:
yieldtuple(lstComboCandidate)
elif lenListSorted - idxItemOfList >= sizeOfCombo - len(lstComboCandidate):
lstComboCandidate.append(lstListSorted[idxItemOfList])
yieldfrom combinate(idxItemOfList + 1)
lstComboCandidate.pop()
yieldfrom combinate(idxNextUnique(idxItemOfList))
if percUnique > percUniqueThresh:
from itertools import combinations
allCombos = combinations(lstListSorted, comboSize)
for comboCandidate in allCombos:
if comboCandidate in setUniqueCombos:
continueyield comboCandidate
setUniqueCombos.add(comboCandidate)
else:
yieldfrom combinate(0)
#:if/else #:def iterFastUniqueCombos()
The below provided timings show that the above iterFastUniqueCombos()
generator function provides a clear advantage
over uniqueCombinations()
variant in case the list has less than 60 percent of unique elements and is not worse as
the on (set + combinations)
based uniqueCombinations()
generator function in the opposite case where it gets much faster than the iterUniqueCombos()
one (due to switching between
the (set + combinations)
and the (no lookups)
variant at 60% threshold for amount of unique elements in the list):
===========sizeOfCombo: 6 sizeOfList:48noOfUniqueInList1percUnique2Combos:12271512print(len(list(combinations(lst,k)))):2.04968seconds.Combos:1print(len(list(iterUniqueCombos(lst,k)))):0.00011seconds.Combos:1print(len(list(iterFastUniqueCombos(lst,k)))):0.00008seconds.Combos:1print(len(list(uniqueCombinations(lst,k)))):3.61812seconds.==========sizeOfCombo: 6 sizeOfList:48noOfUniqueInList48percUnique100Combos:12271512print(len(list(combinations(lst,k)))):1.99383seconds.Combos:12271512print(len(list(iterUniqueCombos(lst,k)))):49.72461seconds.Combos:12271512print(len(list(iterFastUniqueCombos(lst,k)))):8.07997seconds.Combos:12271512print(len(list(uniqueCombinations(lst,k)))):8.11974seconds.==========sizeOfCombo: 6 sizeOfList:48noOfUniqueInList27percUnique56Combos:12271512print(len(list(combinations(lst,k)))):2.02774seconds.Combos:534704print(len(list(iterUniqueCombos(lst,k)))):1.60052seconds.Combos:534704print(len(list(iterFastUniqueCombos(lst,k)))):1.62002seconds.Combos:534704print(len(list(uniqueCombinations(lst,k)))):3.41156seconds.==========sizeOfCombo: 6 sizeOfList:48noOfUniqueInList31percUnique64Combos:12271512print(len(list(combinations(lst,k)))):2.03539seconds.Combos:1114062print(len(list(iterUniqueCombos(lst,k)))):3.49330seconds.Combos:1114062print(len(list(iterFastUniqueCombos(lst,k)))):3.64474seconds.Combos:1114062print(len(list(uniqueCombinations(lst,k)))):3.61857seconds.
Post a Comment for "Fast Unique Combinations (from List With Duplicates) Without Lookups"