Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to do a simple group by without any statistics like mean or count in Pandas?

I have a dataframe with many columns.

I want to do something like the summary tool in Alteryx. i.e. a simple group by. I am not interested in the statistics like mean or count or size.

I dont want a hierarchial output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How can I do it?

df.head()

      Unnamed: 0    home_odds    draw_odds    away_odds  country    league            datetime             home_team        away_team      home_score    away_score
--  ------------  -----------  -----------  -----------  ---------  ----------------  -------------------  ---------------  -----------  ------------  ------------
 0             0         1.36         4.31         7.66  Brazil     Copa do Nordeste  2020-02-07 00:00:00  Sport Recife     Imperatriz              2             2
 1             1         2.62         3.3          2.48  Brazil     Copa do Nordeste  2020-02-02 22:00:00  ABC              America RN              2             1
 2             2         5.19         3.58         1.62  Brazil     Copa do Nordeste  2020-02-02 00:00:00  Frei Paulistano  Nautico                 0             2
 3             3         2.06         3.16         3.5   Brazil     Copa do Nordeste  2020-02-02 22:00:00  Botafogo PB      Confianca               1             1
 4             4         2.19         2.98         3.38  Brazil     Copa do Nordeste  2020-02-02 22:00:00  Fortaleza        Ceara                   1             1

I want a simple grouped by these two columns

e.g.

    country    league
--  ---------  ----------------
 0  Brazil     Copa do Nordeste
 1  World      World Cup
 2  Australia  FFA Cup
 3  Australia  A League
 4  Brazil     Serie A

What is the correct code for it?

>Solution :

If you want to create a dataframe with two columns which shows which country has which league, then select the relevant columns and .drop_duplicates method:

out = df[['country', 'league']].drop_duplicates()

But if you want to create a dataframe that keeps data from the other columns, you might want to do:

out = df.groupby(['country', 'league']).agg(list)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading