Skip to content Skip to sidebar Skip to footer

Group Python Lists Based On Repeated Items

This question is very similar to this one Group Python list of lists into groups based on overlapping items, in fact it could be called a duplicate. Basically, I have a list of sub

Solution 1:

You can describe the merge you want to do as a set consolidation or as a connected-components problem. I tend to use an off-the-shelf set consolidation algorithm and then adapt it to the particular situation. For example, IIUC, you could use something like

def consolidate(sets):
    # http://rosettacode.org/wiki/Set_consolidation#Python:_Iterative
    setlist = [s for s in sets if s]
    for i, s1 in enumerate(setlist):
        if s1:
            for s2 in setlist[i+1:]:
                intersection = s1.intersection(s2)
                if intersection:
                    s2.update(s1)
                    s1.clear()
                    s1 = s2
    return [s for s in setlist if s]

def wrapper(seqs):
    consolidated = consolidate(map(set, seqs))
    groupmap = {x: i for i,seq in enumerate(consolidated) for x in seq}
    output = {}
    for seq in seqs:
        target = output.setdefault(groupmap[seq[0]], [])
        target.append(seq)
    return list(output.values())

which gives

>>> for i, group in enumerate(wrapper(gr)):
...     print('g{}:'.format(i), group)
...     
g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81], [86, 68, 67, 84]]

(Order not guaranteed because of the use of the dictionaries.)


Solution 2:

Sounds like set consolidation if you turn each sub list into a set instead as you are interested in the contents not the order so sets are the best data-structure choice. See this: http://rosettacode.org/wiki/Set_consolidation


Post a Comment for "Group Python Lists Based On Repeated Items"