Home Pandas drop row if column value has appeared more than some number of times depending on the value

Questions

Pandas drop row if column value has appeared more than some number of times depending on the value

September 4, 2022

I have a DataFrame that looks the following:

t = {1: ['A','B'], 2: ['D','F'], 3: ['A','C'], 4: ['B','E'], 5: [‘B’,’B’], 6: ['D','D'], 7: ['A','H']}
df = pd.DataFrame.from_dict(t,orient='index',columns=['X','Y'])
df

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
5  B  B
6  D  D
7  A  H

I then have a dictionary

d = {‘A’: 2, ‘B’: 1, ‘D’: 4}

What I would like to do is to drop the rows in my dataframe corresponding to the nth occurence of the value in the X column, where n is greater than the integer specified in my dictionary for that particular value, while preserving the order of the rows of my DataFrame. So the result of my operation with the above dictionary should be the DataFrame that looks like

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
6  D  D

whereas with the dictionary

d = {‘A’: 1, ‘B’: 2, ‘D’: 1}

it should look like

   X  Y
1  A  B
2  D  F
4  B  E
5  B  B

>Solution :

You can use groupby.cumcount to enumerate the rows, then compare to the threshold with a map

mask = df.groupby('X').cumcount() < df['X'].map(d)

df[mask]

pandas

byMR

Published September 04, 2022

Add a comment

Can't set user.is_active and user.is_admin to True

byMR

September 4, 2022

Questions

Increment index in for loop only if condition met

byMR

September 4, 2022

Questions

Proportion Tables in R

byMR

September 4, 2022

Questions

In Android Kotlin, when creating a dialog, the samples I've seen include "_, _ ->" what does that mean or do?

byMR

September 4, 2022

Questions

How to limit the hours between 0 to 12, please?

byMR

September 4, 2022

Questions

elegant way of capturing a reference to an integer variable?

byMR

September 4, 2022

Pandas drop row if column value has appeared more than some number of times depending on the value

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Can't set user.is_active and user.is_admin to True

Increment index in for loop only if condition met

Proportion Tables in R

In Android Kotlin, when creating a dialog, the samples I've seen include "_, _ ->" what does that mean or do?

How to limit the hours between 0 to 12, please?

elegant way of capturing a reference to an integer variable?

Keep Up to Date with the Most Important News

Pandas drop row if column value has appeared more than some number of times depending on the value

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Can't set user.is_active and user.is_admin to True

Increment index in for loop only if condition met

Proportion Tables in R

In Android Kotlin, when creating a dialog, the samples I've seen include "_, _ ->" what does that mean or do?

How to limit the hours between 0 to 12, please?

elegant way of capturing a reference to an integer variable?

Discover more from Dev solutions