Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas – show N highest counts in a group-by dataframe

Here is my input DataFrame

df = pd.DataFrame([
 ['CA', 'LAX', 'Alice'],
 ['NY', 'NYC', 'Alice'],
 ['TX', 'HOU', 'Alice'],
 ['CA', 'SFO', 'Bob'], 
 ['NY', 'NYC', 'Bob'],
 ['TX', 'AUS', 'Bob'], 
 ['CA', 'LAX', 'Carol'],
 ['NY', 'ALB', 'Carol'], 
 ['TX', 'HOU', 'Carol'],
 ['CA', 'LAX', 'Dan'],
 ['NY', 'NYC', 'Dan'],
 ['TX', 'DAL', 'Dan'],
 ['CA', 'SFO', 'Eve'],
 ['NY', 'NYC', 'Eve'],
 ['TX', 'AUS', 'Eve'], 
 ['CA', 'SDO', 'Frank'],
 ['NY', 'NYC', 'Frank'],
 ['TX', 'HOU', 'Frank']], 
 columns=['State','City','Person']) 

And here is the output I wish to get:

State City  Count       
CA    LAX    3
      SFO    2
NY    ALB    1
      NYC    5
TX    HOU    3
      AUS    2

Basically, I want to show the top 2 cities having the highest count of people, for each state.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried using this groupby statement:
df.groupby(['State','City'])['Person'].count()

which generates below output

State  City         
CA     LAX    3
       SDO    1
       SFO    2
NY     ALB    1
       NYC    5
TX     AUS    2
       DAL    1
       HOU    3

But I am unable to come up with statements which could sort the above counts and pick the highest two counts. Would really appreciate the help!

>Solution :

I think you can simplify the approach.

You can use groupby.value_counts(), instead of count(), to get where you are, and then chain another groupby().head(2) to get the top 2 Cities per each state.

The reason is that value_counts() resulting object will be in descending order, so that the first element is the most frequently-occurring element, so you can use head(2) as it is.

df.groupby('State')['City'].value_counts().groupby('State').head(2)

State  City
CA     LAX     3
       SFO     2
NY     NYC     5
       ALB     1
TX     HOU     3
       AUS     2
Name: City, dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading