Get Information Out Of Sub-lists In Main List Elegantly
Ok, so here's my issue. I have a list composed of N sub-lists composed of M elements (floats) each. So in a general form it looks like this: a_list = [b_list_1, b_list_2, ..., b_li
Solution 1:
I think you can certainly make your code more concise and easier to read by using defaultdict
to create a dictionary from the first two elements in each sublist to all the third items:
from collections import defaultdict
nums = defaultdict(list)
for arr in a:
key = tuple(arr[:2]) # make the first two floats the key
nums[key].append( arr[2] ) # append the third float for the given key
a_processed = [[k[0], k[1], sum(vals)/len(vals)] for k, vals in nums.items()]
Using this, I get the same output as you (albeit in a different order):
[[0.2, 1.1, 0.8], [1.2, 0.3, 0.6], [0.3, 1.4, 0.2], [0.6, 0.4, 0.9], [1.1, 0.5, 0.6666666666666666], [0.6, 0.2, 0.75]]
If the order of a_processed
is an issue, you can use an OrderedDict
, as pointed out by @DSM.
Solution 2:
For comparison, here's the pandas
approach. If this is really a data processing problem behind the scenes, then you can save yourself a lot of time that way.
>>> a
[[1.1, 0.5, 0.7], [0.3, 1.4, 0.2], [0.6, 0.2, 1.0], [1.1, 0.5, 0.3], [0.2, 1.1, 0.8], [1.1, 0.5, 1.0], [1.2, 0.3, 0.6], [0.6, 0.4, 0.9], [0.6, 0.2, 0.5]]
>>> df = pd.DataFrame(a)
>>> df.groupby([0,1]).mean()
2
0 1
0.2 1.1 0.800000
0.3 1.4 0.200000
0.6 0.2 0.750000
0.4 0.900000
1.1 0.5 0.666667
1.2 0.3 0.600000
This problem is common enough that it's a one-liner. You can use named columns, compute a host of other useful statistics, handle missing data, etc.
Post a Comment for "Get Information Out Of Sub-lists In Main List Elegantly"