Pandas Groupby : How To Get Top N Values Based On A Column
forgive me if this is a basic question but i am new to pandas. I have a dataframe with with a column A and i would like to get the top n rows based on the count in Column A. For in
Solution 1:
IIUC you can use function nlargest
.
I try your sample data and get top 2 rows by column C
:
printdf
A B C
0 x 12 ere
1 x 34 bfhg
2 z 6 bgn
3 z 8 rty
4 y 567 hmmu,,u
5 x 545 fghfgj
6 x 44 zxcbv
dcf = df.groupby(['A'],as_index=False).count()
print dcf
A B C
0 x 4 4
1 y 1 1
2 z 2 2
#get 2 largest rows by column Cprint dcf.nlargest(2,'C')
A B C
0 x 4 4
2 z 2 2
Solution 2:
one approach that i tried
import heapq
dcf = df.groupby(['A'],as_index=False).count()
print dcf.loc[dcf['C'].isin(heapq.nlargest(5,dcf['C']))].sort(['C'],ascending=False)
gives me
ABC1664 g1511511887 k85851533 q727253y68681793 t6262
verified by
print len(df.loc[df["A"]=="g"])
gives me
151
so i get the desired results as i can see the top 5 values based on the count from Column A. but surely there must be a better way of doing this?
Post a Comment for "Pandas Groupby : How To Get Top N Values Based On A Column"