Group And Combine Items Of Multiple-column Lists With Itertools/more-itertools In Python
Solution 1:
You can build up on the same recipe and modify the lambda function to include the first item(country) from each row as well. Secondly, you need to sort the list first based on the last occurrence of the country in the list.
from itertools import groupby, count
L = [
['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2']]
indices = {row[0]: i for i, row in enumerate(L)}
sorted_l = sorted(L, key=lambda row: indices[row[0]])
groups = groupby(
sorted_l,
lambda item, c=count(): [item[0], int(item[1]) - next(c)]
)
for k, g in groups:
print [k[0]] + ['-'.join(x) for x in zip(*(x[1:] for x in g))]
Output:
['Italy', '1-2-3', '3-1-10']
['France', '5', '3']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']
Solution 2:
This is essentially the same grouping technique, but rather than using itertools.count
it uses enumerate
to produce the indices.
First, we sort the data so that all items for a given country are grouped together, and the data is sorted. Then we use groupby
to make a group for each country. Then we use groupby
in the inner loop to group together the consecutive data for each country. Finally, we use zip
& .join
to re-arrange the data into the desired output format.
from itertools import groupby
from operator import itemgetter
lst = [
['Italy','1','3'],
['Italy','2','1'],
['Spain','4','2'],
['Spain','5','8'],
['Italy','3','10'],
['Spain','6','4'],
['France','5','3'],
['Spain','20','2'],
]
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), lambda t: int(t[1][1]) - t[0])]
for row in newlst:
print(row)
output
['France', '5', '3']
['Italy', '1-2-3', '3-1-10']
['Spain', '20', '2']
['Spain', '4-5-6', '2-8-4']
I admit that lambda
is a bit cryptic; it'd probably better to use a proper def
function instead. I'll add that here in a few minutes.
Here's the same thing using a much more readable key function.
def keyfunc(t):
# Unpack the index and data
i, data = t
# Get the 2nd column from the data, as an integer
val = int(data[1])
# The difference between val & i is constant in a consecutive group
return val - i
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), keyfunc)]
Solution 3:
Instead of using itertools.groupby
that requires multiple sorting, checking, etc. Here is an algorithmically optimized approach using dictionaries:
d = {}
flag = False
for country, i, j in L:
temp = 1
try:
item = int(i)
for counter, recs in d[country].items():
temp += 1
last = int(recs[-1][0])
if item in {last - 1, last, last + 1}:
recs.append([i, j])
recs.sort(key=lambda x: int(x[0]))
flag = True
break
if flag:
flag = False
continue
else:
d[country][temp] = [[i, j]]
except KeyError:
d[country] = {}
d[country][1] = [[i, j]]
Demo on a more complex example:
L = [['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2'],
['France', '5', '44'],
['France', '9', '3'],
['Italy', '3', '10'],
['Italy', '5', '17'],
['Italy', '4', '13'],]
{'France': {1: [['5', '3'], ['5', '44']], 2: [['9', '3']]},
'Spain': {1: [['4', '2'], ['5', '8'], ['6', '4']], 2: [['20', '2']]},
'Italy': {1: [['1', '3'], ['2', '1'], ['3', '10'], ['3', '10'], ['4', '13']], 2: [['5', '17']]}}
# You can then produce the results in your intended format as below:
for country, recs in d.items():
for rec in recs.values():
i, j = zip(*rec)
print([country, '-'.join(i), '-'.join(j)])
['France', '5-5', '3-44']
['France', '9', '3']
['Italy', '1-2-3-3-4', '3-1-10-10-13']
['Italy', '5', '17']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']
Solution 4:
from collections import namedtuple
country = namedtuple('country','name score1 score2')
master_dict = {}
isolated_dict = {}
for val in L:
data = country(*val)
name = data.name
if name in master_dict:
local_data = master_dict[name]
if (int(local_data[1][-1]) + 1) == int(data.score1):
local_data[1] += '-' + data.score1
local_data[2] += '-' + data.score2
else:
if name in isolated_dict:
another_local_data = isolated_dict[name]
another_local_data[1] += '-' + data.score1
another_local_data[2] += '-' + data.score2
else:
isolated_dict[name] = [name,data.score1,data.score2]
else:
master_dict.setdefault(name, [name,data.score1,data.score2])
country_data = list(master_dict.values())+list(isolated_dict.values())
print(country_data)
>>>[['Italy', '1-2-3', '3-1-10'],
['Spain', '4-5-6', '2-8-4'],
['France', '5', '3'],
['Spain', '20', '2']]
Solution 5:
Here is how one might use more_itertools
, a third-party library of itertools-like recipes.
more_itertools.consecutive_groups
can group consecutive items by some condition.
Given
import collections as ct
import more_itertools as mit
lst = [
['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2']
]
Code
Pre-process data into a dictionary for fast, flexible lookups:
dd = ct.defaultdict(list)
for row in lst:
dd[row[0]].append(row[1:])
dd
Intermediate Output
defaultdict(list,
{'France': [['5', '3']],
'Italy': [['1', '3'], ['2', '1'], ['3', '10']],
'Spain': [['4', '2'], ['5', '8'], ['6', '4'], ['20', '2']]})
Now build whatever output you wish:
result = []
for k, v in dd.items():
cols = [[int(item) for item in col] for col in zip(*v)]
grouped_rows = [list(c) for c in mit.consecutive_groups(zip(*cols), lambda x: x[0])]
grouped_cols = [["-".join(map(str, c)) for c in zip(*grp)] for grp in grouped_rows]
for grp in grouped_cols:
result.append([k, *grp])
result
Final Output
[['Italy', '1-2-3', '3-1-10'],
['Spain', '4-5-6', '2-8-4'],
['Spain', '20', '2'],
['France', '5', '3']]
Details
- We build a lookup dict of (country, row(s)) key-value pairs.
- Row values are converted to integer columns.
- Columns are made by zipping rows, which is passed to
more_itertools.consecutive_groups
. In return are groups of rows based on your condition (here, it is based on the first columnlambda x: x[0]
the dictionary valuesdd
. This is equivalent to the OP's "second column"). - We rejoin rows as groups of stringed columns.
- Each iterated item is appended to the resulting list.
Note: resulting order was not specified, but you can sort the output however you wish using sorted()
and a key function. In Python 3.6, insertion order is preserved in the dictionary, creating reproducible dictionaries.
Post a Comment for "Group And Combine Items Of Multiple-column Lists With Itertools/more-itertools In Python"