I want to get the row indices for values that meet my search criteria.
Consider this data:
df = pd.DataFrame({'group': ['A', 'A', 'B', 'B', 'A'],
'item': ['BIOS', 'BIOS', 'Memory', 'Disk', 'BIOS'],
'val': [1, 2, 1, 2, 3]})
I can get the value counts like so:
df.value_counts(subset=['group', 'item'])
I can generate a new dataframe of results like so:
df.groupby('group')['item'].value_counts().loc[lambda x: x > 1].reset_index(name='count')
But how can I get the indices of the rows that return True when checking for my value_count condition?
# tacking on .index will yield a multi-index result
df.value_counts(subset=['group', 'item']).loc[lambda x: x > 2].index
Expected out:
[0, 1, 4]
>Solution :
You can use groupby in place of value_counts:
df.groupby(['group', 'item']).filter(lambda g: len(g)>1).index
variant:
mask = df.groupby(['group', 'item'])['val'].transform('size').gt(1)
df.index[mask]
output:
Int64Index([0, 1, 4], dtype='int64')
NB. add .to_list() to get a list