Skip to content Skip to sidebar Skip to footer

Get Information Out Of Sub-lists In Main List Elegantly

Ok, so here's my issue. I have a list composed of N sub-lists composed of M elements (floats) each. So in a general form it looks like this: a_list = [b_list_1, b_list_2, ..., b_li

Solution 1:

I think you can certainly make your code more concise and easier to read by using defaultdict to create a dictionary from the first two elements in each sublist to all the third items:

from collections import defaultdict
nums = defaultdict(list)
for arr in a:
    key = tuple(arr[:2]) # make the first two floats the key
    nums[key].append( arr[2] ) # append the third float for the given key

a_processed = [[k[0], k[1], sum(vals)/len(vals)] for k, vals in nums.items()]

Using this, I get the same output as you (albeit in a different order):

[[0.2, 1.1, 0.8], [1.2, 0.3, 0.6], [0.3, 1.4, 0.2], [0.6, 0.4, 0.9], [1.1, 0.5, 0.6666666666666666], [0.6, 0.2, 0.75]]

If the order of a_processed is an issue, you can use an OrderedDict, as pointed out by @DSM.


Solution 2:

For comparison, here's the pandas approach. If this is really a data processing problem behind the scenes, then you can save yourself a lot of time that way.

>>> a
[[1.1, 0.5, 0.7], [0.3, 1.4, 0.2], [0.6, 0.2, 1.0], [1.1, 0.5, 0.3], [0.2, 1.1, 0.8], [1.1, 0.5, 1.0], [1.2, 0.3, 0.6], [0.6, 0.4, 0.9], [0.6, 0.2, 0.5]]
>>> df = pd.DataFrame(a)
>>> df.groupby([0,1]).mean()
                2
0   1            
0.2 1.1  0.800000
0.3 1.4  0.200000
0.6 0.2  0.750000
    0.4  0.900000
1.1 0.5  0.666667
1.2 0.3  0.600000

This problem is common enough that it's a one-liner. You can use named columns, compute a host of other useful statistics, handle missing data, etc.


Post a Comment for "Get Information Out Of Sub-lists In Main List Elegantly"