Pandas: Determine if columns are matched

I’m trying to eliminate all rows that match in col0 and col1, but don’t have a pair of -1, 1 between rows (for example in the dataframe below there isn’t a a2, b1, -1 row). I was trying to come up with someway to do this, but was groupby and getting multiindex and not getting anywhere…

#   no a2, b1, -1
df = pd.DataFrame([
    ['a1', 'b1', -1, 0/1],
    ['a1', 'b1',  1, 1/1],
    ['a1', 'b2', -1, 2/1],
    ['a1', 'b2',  1, 1/2],
    ['a2', 'b1',  1, 1/3],
    ['a2', 'b2', -1, 2/3],
    ['a2', 'b2',  1, 4/1]
], columns=['col0', 'col1', 'col2', 'val'])

# desired output
# a1, b1, -1, 0.0
# a1, b1,  1, 1.0
# a1, b2, -1, 2.0
# a1, b2,  1, 0.5
# a2, b2, -1, 0.66667
# a2, b2,  1, 4.0

>Solution :

We can use groupby filter to test if there are at least 1 (any) of each value (-1 and 1) per group with Series.any:

result_df = df.groupby(['col0', 'col1']).filter(
    lambda x: x['col2'].eq(-1).any() and x['col2'].eq(1).any()
)

result_df:

  col0 col1  col2       val
0   a1   b1    -1  0.000000
1   a1   b1     1  1.000000
2   a1   b2    -1  2.000000
3   a1   b2     1  0.500000
5   a2   b2    -1  0.666667
6   a2   b2     1  4.000000

Leave a Reply