Skip to content Skip to sidebar Skip to footer

Pandas Groupby : How To Get Top N Values Based On A Column

forgive me if this is a basic question but i am new to pandas. I have a dataframe with with a column A and i would like to get the top n rows based on the count in Column A. For in

Solution 1:

IIUC you can use function nlargest.

I try your sample data and get top 2 rows by column C:

printdf
   A    B        C
0  x   12      ere
1  x   34     bfhg
2  z    6      bgn
3  z    8      rty
4  y  567  hmmu,,u
5  x  545   fghfgj
6  x   44    zxcbv

dcf = df.groupby(['A'],as_index=False).count()
print dcf
   A  B  C
0  x  4  4
1  y  1  1
2  z  2  2

#get 2 largest rows by column Cprint dcf.nlargest(2,'C')
   A  B  C
0  x  4  4
2  z  2  2

Solution 2:

one approach that i tried

import heapq

dcf =  df.groupby(['A'],as_index=False).count()
print dcf.loc[dcf['C'].isin(heapq.nlargest(5,dcf['C']))].sort(['C'],ascending=False)

gives me

ABC1664  g1511511887  k85851533  q727253y68681793  t6262

verified by

print len(df.loc[df["A"]=="g"])

gives me

151

so i get the desired results as i can see the top 5 values based on the count from Column A. but surely there must be a better way of doing this?

Post a Comment for "Pandas Groupby : How To Get Top N Values Based On A Column"