Group And Combine Items Of Multiple-column Lists With Itertools/more-itertools In Python

March 29, 2023 Post a Comment

This code: from itertools import groupby, count L = [38, 98, 110, 111, 112, 120, 121, 898] groups = groupby(L, key=lambda item, c=count():item-next(c)) tmp = [list(g) for k, g in

Solution 1:

You can build up on the same recipe and modify the lambda function to include the first item(country) from each row as well. Secondly, you need to sort the list first based on the last occurrence of the country in the list.

from itertools import groupby, count


L = [
    ['Italy', '1', '3'],
    ['Italy', '2', '1'],
    ['Spain', '4', '2'],
    ['Spain', '5', '8'],
    ['Italy', '3', '10'],
    ['Spain', '6', '4'],
    ['France', '5', '3'],
    ['Spain', '20', '2']]


indices = {row[0]: i for i, row in enumerate(L)}
sorted_l = sorted(L, key=lambda row: indices[row[0]])
groups = groupby(
    sorted_l,
    lambda item, c=count(): [item[0], int(item[1]) - next(c)]
)
for k, g in groups:
    print [k[0]] + ['-'.join(x) for x in zip(*(x[1:] for x in g))]

Output:

['Italy', '1-2-3', '3-1-10']
['France', '5', '3']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']

Solution 2:

This is essentially the same grouping technique, but rather than using itertools.count it uses enumerate to produce the indices.

First, we sort the data so that all items for a given country are grouped together, and the data is sorted. Then we use groupby to make a group for each country. Then we use groupby in the inner loop to group together the consecutive data for each country. Finally, we use zip & .join to re-arrange the data into the desired output format.

from itertools import groupby
from operator import itemgetter

lst = [
    ['Italy','1','3'],
    ['Italy','2','1'],
    ['Spain','4','2'],
    ['Spain','5','8'],
    ['Italy','3','10'],
    ['Spain','6','4'],
    ['France','5','3'],
    ['Spain','20','2'],
]

newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
    for country, u in groupby(sorted(lst), itemgetter(0))
        for _, g in groupby(enumerate(u), lambda t: int(t[1][1]) - t[0])]

for row in newlst:
    print(row)

output

['France', '5', '3']
['Italy', '1-2-3', '3-1-10']
['Spain', '20', '2']
['Spain', '4-5-6', '2-8-4']

I admit that lambda is a bit cryptic; it'd probably better to use a proper def function instead. I'll add that here in a few minutes.

Here's the same thing using a much more readable key function.

def keyfunc(t):
    # Unpack the index and data
    i, data = t
    # Get the 2nd column from the data, as an integer
    val = int(data[1])
    # The difference between val & i is constant in a consecutive group
    return val - i

newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
    for country, u in groupby(sorted(lst), itemgetter(0))
        for _, g in groupby(enumerate(u), keyfunc)]

Solution 3:

Instead of using itertools.groupby that requires multiple sorting, checking, etc. Here is an algorithmically optimized approach using dictionaries:

d = {}
flag = False
for country, i, j in L:
    temp = 1
    try:
        item = int(i)
        for counter, recs in  d[country].items():
            temp += 1
            last = int(recs[-1][0])
            if item in {last - 1, last, last + 1}:
                recs.append([i, j])
                recs.sort(key=lambda x: int(x[0]))
                flag = True
                break
        if flag:
            flag = False
            continue
        else:
            d[country][temp] = [[i, j]]
    except KeyError:
        d[country] = {}
        d[country][1] = [[i, j]]

Demo on a more complex example:

L = [['Italy', '1', '3'],
 ['Italy', '2', '1'],
 ['Spain', '4', '2'],
 ['Spain', '5', '8'],
 ['Italy', '3', '10'],
 ['Spain', '6', '4'],
 ['France', '5', '3'],
 ['Spain', '20', '2'],
 ['France', '5', '44'],
 ['France', '9', '3'],
 ['Italy', '3', '10'],
 ['Italy', '5', '17'],
 ['Italy', '4', '13'],]

{'France': {1: [['5', '3'], ['5', '44']], 2: [['9', '3']]},
 'Spain': {1: [['4', '2'], ['5', '8'], ['6', '4']], 2: [['20', '2']]},
 'Italy': {1: [['1', '3'], ['2', '1'], ['3', '10'], ['3', '10'], ['4', '13']], 2: [['5', '17']]}}

# You can then produce the results in your intended format as below:
for country, recs in d.items():
    for rec in recs.values():
        i, j = zip(*rec)
        print([country, '-'.join(i), '-'.join(j)])

['France', '5-5', '3-44']
['France', '9', '3']
['Italy', '1-2-3-3-4', '3-1-10-10-13']
['Italy', '5', '17']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']

Solution 4:

from collections import namedtuple
country = namedtuple('country','name score1 score2')
master_dict = {}
isolated_dict = {}
for val in L:
    data = country(*val)
    name = data.name
    if name in master_dict:
        local_data = master_dict[name]
        if (int(local_data[1][-1]) + 1) == int(data.score1):
            local_data[1] += '-' + data.score1
            local_data[2] += '-' + data.score2
        else:
            if name in isolated_dict:
                another_local_data = isolated_dict[name]
                another_local_data[1] += '-' + data.score1
                another_local_data[2] += '-' + data.score2
            else:
                isolated_dict[name] = [name,data.score1,data.score2]
    else:
        master_dict.setdefault(name, [name,data.score1,data.score2])
country_data = list(master_dict.values())+list(isolated_dict.values())
print(country_data)

>>>[['Italy', '1-2-3', '3-1-10'],
 ['Spain', '4-5-6', '2-8-4'],
 ['France', '5', '3'],
 ['Spain', '20', '2']]

Solution 5:

Here is how one might use more_itertools, a third-party library of itertools-like recipes.

more_itertools.consecutive_groups can group consecutive items by some condition.

Given

import collections as ct

import more_itertools as mit


lst = [
    ['Italy',  '1', '3'],
    ['Italy',  '2', '1'],
    ['Spain',  '4', '2'],
    ['Spain',  '5', '8'],
    ['Italy',  '3', '10'],
    ['Spain',  '6', '4'],
    ['France', '5', '3'],
    ['Spain', '20', '2']
]

Code

Pre-process data into a dictionary for fast, flexible lookups:

dd = ct.defaultdict(list)
for row in lst:
    dd[row[0]].append(row[1:])
dd

Intermediate Output

defaultdict(list,
            {'France': [['5', '3']],
             'Italy': [['1', '3'], ['2', '1'], ['3', '10']],
             'Spain': [['4', '2'], ['5', '8'], ['6', '4'], ['20', '2']]})

Now build whatever output you wish:

result = []
for k, v in dd.items():
    cols = [[int(item) for item in col] for col in zip(*v)]
    grouped_rows = [list(c) for c in mit.consecutive_groups(zip(*cols), lambda x: x[0])]
    grouped_cols = [["-".join(map(str, c)) for c in zip(*grp)] for grp in grouped_rows]
    for grp in grouped_cols:
        result.append([k, *grp])        

result

Final Output

[['Italy', '1-2-3', '3-1-10'],
 ['Spain', '4-5-6', '2-8-4'],
 ['Spain', '20', '2'],
 ['France', '5', '3']]

Details

We build a lookup dict of (country, row(s)) key-value pairs.
Row values are converted to integer columns.
Columns are made by zipping rows, which is passed to more_itertools.consecutive_groups. In return are groups of rows based on your condition (here, it is based on the first column lambda x: x[0] the dictionary values dd. This is equivalent to the OP's "second column").
We rejoin rows as groups of stringed columns.
Each iterated item is appended to the resulting list.

Note: resulting order was not specified, but you can sort the output however you wish using sorted() and a key function. In Python 3.6, insertion order is preserved in the dictionary, creating reproducible dictionaries.

Python Playground