What's The Most Concise Way In Python To Group And Sum A List Of Objects By The Same Property
Solution 1:
The defaultdict approach is probably better, assuming c.Y is hashable, but here's another way:
from itertools import groupby
from operator import attrgetter
get_y = attrgetter('Y')
tuples = [(y, sum(c.Z for c in cs_with_y) for y, cs_with_y in
groupby(sorted(cs, key=get_y), get_y)]
To be a little more concrete about the differences:
This approach requires making a sorted copy of
cs, which takes O(n log n) time and O(n) extra space. Alternatively, you can docs.sort(key=get_y)to sortcsin-place, which doesn't need extra space but does modify the listcs. Note thatgroupbyreturns an iterator so there's not any extra overhead there. If thec.Yvalues aren't hashable, though, this does work, whereas thedefaultdictapproach will throw aTypeError.But watch out -- in recent Pythons it'll raise
TypeErrorif there are any complex numbers in there, and maybe in other cases. It might be possible to make this work with an appropriatekeyfunction --key=lambda e: (e.real, e.imag) if isinstance(e, complex) else eseems to be working for anything I've tried against it right now, though of course custom classes that override the__lt__operator to raise an exception are still no go. Maybe you could define a more complicated key function that tests for this, and so on.Of course, all we care about here is that equal things are next to each other, not so much that it's actually sorted, and you could write an O(n^2) function to do that rather than sort if you so desired. Or a function that's O(num_hashable + num_nonhashable^2). Or you could write an O(n^2) / O(num_hashable + num_nonhashable^2) version of
groupbythat does the two together.sblom's answer works for hashable
c.Yattributes, with minimal extra space (because it computes the sums directly).philhag's answer is basically the same as sblom's, but uses more auxiliary memory by making a list of each of the
cs -- effectively doing whatgroupbydoes, but with hashing instead of assuming it's sorted and with actual lists instead of iterators.
So, if you know your c.Y attribute is hashable and only need the sums, use sblom's; if you know it's hashable but want them grouped for something else as well, use philhag's; if they might not be hashable, use this one (with extra worrying as noted if they might be complex or a custom type that overrides __lt__).
Solution 2:
from collections importdefaultdicttotals= defaultdict(int)
for c in cs:
totals[c.Y] += c.Ztuples= totals.items()
Solution 3:
You can use collections.defaultdict to group the list by y values, and then sum over their z values:
import collections
ymap = collections.defaultdict(list)forcin listOfCs:
ymap[c.Y].append(c)
print ([(y,sum(c.Z forcin clist))for y,clist in ymap.values()])Solution 4:
With pandas it might be something like:
df.groupby('Y')['Z'].sum()
Example
>>>import pandas>>>df = pandas.DataFrame(dict(X=[1,2,3], Y=[1,-1,1], Z=[3,4,5]))>>>df
X Y Z
0 1 1 3
1 2 -1 4
2 3 1 5
>>>df.groupby('Y')['Z'].sum()
Y
-1 4
1 8
>>>Solution 5:
You can use Counter
from collections import Counter
cnt = Counter()
for c in cs:
cnt[c.Y] += c.Z
print cnt
Post a Comment for "What's The Most Concise Way In Python To Group And Sum A List Of Objects By The Same Property"