Home Get values in one column based on an AND condition in another column in python

Questions

Get values in one column based on an AND condition in another column in python

May 8, 2022

I have the below dataframe in python with column names "Order_number" and "item_id" which represents the order number and the items consisting in that order. I need to fetch order numbers that consist of item_ids A AND B so my result should be Order Number 12345 and 84573

Order_number	item_id
12345	A
12345	B
34235	B
43543	B
84573	A
84573	B
45846	A

I have the below code but I am getting an error:

df[df['item_id'] == 'A' & df['item_id'] == 'B'] 

TypeError: Cannot perform 'rand_' with a dtyped [object] array and scalar of type [bool]

>Solution :

You can groupby.apply a lambda that checks if the unique "item_id"s include both "A" and "B" for each "Order_number"; then filter the ones that do:

out = df.groupby('Order_number')['item_id'].apply(lambda x: {'A','B'}.issubset(x.unique().tolist())).pipe(lambda x: x.index[x]).tolist()

Another option is to use groupby.any twice; once for "A" and again for "B" to create boolean Series objects that return True if an item_id exists for an Order_number; then since we want both to exist, we use & and filter the "Order_number"s:

out = (df['item_id'].eq('A').groupby(df['Order_number']).any() & df['item_id'].eq('B').groupby(df['Order_number']).any()).pipe(lambda x: x.index[x].tolist())

Output: