Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas drop row if column value has appeared more than some number of times depending on the value

I have a DataFrame that looks the following:

t = {1: ['A','B'], 2: ['D','F'], 3: ['A','C'], 4: ['B','E'], 5: [‘B’,’B’], 6: ['D','D'], 7: ['A','H']}
df = pd.DataFrame.from_dict(t,orient='index',columns=['X','Y'])
df

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
5  B  B
6  D  D
7  A  H

I then have a dictionary

d = {‘A’: 2, ‘B’: 1, ‘D’: 4}

What I would like to do is to drop the rows in my dataframe corresponding to the nth occurence of the value in the X column, where n is greater than the integer specified in my dictionary for that particular value, while preserving the order of the rows of my DataFrame. So the result of my operation with the above dictionary should be the DataFrame that looks like

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   X  Y
1  A  B
2  D  F
3  A  C
4  B  E
6  D  D

whereas with the dictionary

d = {‘A’: 1, ‘B’: 2, ‘D’: 1}

it should look like

   X  Y
1  A  B
2  D  F
4  B  E
5  B  B

>Solution :

You can use groupby.cumcount to enumerate the rows, then compare to the threshold with a map

mask = df.groupby('X').cumcount() < df['X'].map(d)

df[mask]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading