Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to save column when you group by another column?

I have the dataset with top 100 richest people in the world.
enter image description here
I want to group by "age" column, leave only max value in "net_worth" and have the third column – "name" of this person.
I could make two columns with code

df = df.groupby(['age']).agg({'net_worth': ['max']}) 

enter image description here
I want to have third column "name", but I don’t know how to do it

I tried

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = df.groupby(['age', 'name']).agg({'net_worth': ['max']}) 

But ‘name’ column involved in group.
I need smth like this:
enter image description here

>Solution :

Use DataFrameGroupBy.idxmax with extract numbers from net_worth by Series.str.extract:

s = df['net_worth'].str.extract(r'(\d+)', expand=False).astype(int)
out = df.loc[s.groupby(df['age']).idxmax(),['net_worth','name','age']]
print (out.head())
      net_worth             name   age
90  $18 Billion     Lukas Walton  36.0
92  $17 Billion      Pavel Durov  37.0
17  $70 Billion  Mark Zuckerberg  38.0
30  $44 Billion     Zhang Yiming  39.0
82  $19 Billion  Eduardo Saverin  40.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading