Skip to content Skip to sidebar Skip to footer

Pandas Data Frame: How To Filter On On A Column And Afterwards Build Count And Sum Of Other Columns

Here is the problem: I have following data: df = pandas.DataFrame({'A': [10, 10, 20, 20, 30, 20, 10, 20, 30, 30], 'B': [1001, 1002, 2002, 2003, 3001, 2003, 1002, 20

Solution 1:

Here is one way:

>>>df.groupby('A').apply(lambda g: pandas.Series({"MaxB": g.B.max(), "NumMax": (g.B==g.B.max()).sum()}))
    MaxB  NumMax
A               
10  1002       2
20  2003       3
30  3005       2

The operation (g.B==g.B.max()).sum() counts the number of rows in the group whose B column is equal to the max value of B.

This way of doing it calculates the group max twice per group, but computing the max is a pretty fast operation, so this won't cause much performance impact in practice.

Post a Comment for "Pandas Data Frame: How To Filter On On A Column And Afterwards Build Count And Sum Of Other Columns"