Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Get indices rows of that match group-by value_count

I want to get the row indices for values that meet my search criteria.

Consider this data:

df = pd.DataFrame({'group': ['A', 'A', 'B', 'B', 'A'],
                   'item': ['BIOS', 'BIOS', 'Memory', 'Disk', 'BIOS'],
                   'val': [1, 2, 1, 2, 3]})

I can get the value counts like so:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.value_counts(subset=['group', 'item'])

I can generate a new dataframe of results like so:

df.groupby('group')['item'].value_counts().loc[lambda x: x > 1].reset_index(name='count')

But how can I get the indices of the rows that return True when checking for my value_count condition?

# tacking on .index will yield a multi-index result
df.value_counts(subset=['group', 'item']).loc[lambda x: x > 2].index

Expected out:

[0, 1, 4]

>Solution :

You can use groupby in place of value_counts:

df.groupby(['group', 'item']).filter(lambda g: len(g)>1).index

variant:

mask = df.groupby(['group', 'item'])['val'].transform('size').gt(1)
df.index[mask]

output:

Int64Index([0, 1, 4], dtype='int64')

NB. add .to_list() to get a list

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading