Home pandas groupby().head(n) where n is a function of group label

Questions

pandas groupby().head(n) where n is a function of group label

October 7, 2022

I have a dataframe, and I would like to group by a column and take the head of each group, but I want the depth of the head to be defined by a function of the group label. If it weren’t for the variable group sizes, I could easily do df.groupby('label').head(n). I can imagine a solution that involves iterating through df['label'].unique(), slicing the dataframe and building a new one, but I’m in a context where I’m pretty sensitive to performance so I’d like to avoid that kind of iteration if possible.

Here’s an exmaple dataframe:

  label   values
0  apple       7
1  apple       5
2  apple       4
3    car       9
4    car       6
5    dog       5
6    dog       3
7    dog       2
8    dog       1

and code for my example setup:

import pandas as pd
df = pd.DataFrame({'label': ['apple', 'apple', 'apple', 'car', 'car', 'dog', 'dog', 'dog', 'dog'],
          'values': [7, 5, 4, 9, 6, 5, 3, 2 ,1]})
def depth(label):
    if label == 'apple': return 1
    elif label == 'car': return 2
    elif label == 'dog': return 3

my desired output is a dataframe with the number of rows from each group defined by that function:

   label  values
0  apple       7
3    car       9
4    car       6
5    dog       5
6    dog       3
7    dog       2

>Solution :

I would use a dictionary here and using <group>.name in groupby.apply:

depth = {'apple': 1, 'car': 2, 'dog': 3}

out = (df.groupby('label', group_keys=False)
         .apply(lambda g: g.head(depth.get(g.name, 0)))
       )

NB. if you really need a function, you can do the same with a function call. Make sure to return a value in every case.

Alternative option with groupby.cumcount and boolean indexing:

out = df[df['label'].map(depth).gt(df.groupby('label').cumcount())]

output:

   label  values
0  apple       7
3    car       9
4    car       6
5    dog       5
6    dog       3
7    dog       2

group-by

byMR

Published October 07, 2022

Add a comment

Deleting everything that is not a number from the array. Why isn't this working?

byMR

October 7, 2022

Questions

CS50 week 5: Speller

byMR

October 7, 2022

Questions

Replace value in column in pandas dataframe based on another column value in same row?

byMR

October 7, 2022

Questions

I want to get all child of child items one by one in firebase database

byMR

October 7, 2022

Questions

Cannot read properties of undefined (reading 'concepts')

byMR

October 7, 2022

Questions

(Python) How do I remove all adjacent substrings in a string and keep only the first occurrence?

byMR

October 7, 2022

pandas groupby().head(n) where n is a function of group label

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Deleting everything that is not a number from the array. Why isn't this working?

CS50 week 5: Speller

Replace value in column in pandas dataframe based on another column value in same row?

I want to get all child of child items one by one in firebase database

Cannot read properties of undefined (reading 'concepts')

(Python) How do I remove all adjacent substrings in a string and keep only the first occurrence?

Keep Up to Date with the Most Important News

pandas groupby().head(n) where n is a function of group label

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Deleting everything that is not a number from the array. Why isn't this working?

CS50 week 5: Speller

Replace value in column in pandas dataframe based on another column value in same row?

I want to get all child of child items one by one in firebase database

Cannot read properties of undefined (reading 'concepts')

(Python) How do I remove all adjacent substrings in a string and keep only the first occurrence?

Discover more from Dev solutions