Pandas remove rows with only one type of value, repeated or not

March 11, 2024

I am trying to remove rows whose all values are the same, or a combination of the same values.

I have, for example a dataframe like

data = {'A': ['1, 1, 1', '1', '2', '3', '1'], 'B': ['1', '1,1,1,1', '2', '4', '1'], 'C': ['1, 1', '2', '3', '5', '1']}

I want to remove rows whose values in all columns are ‘1’ or any combination of ‘1’.
The final result should be something like:

data = {'A': ['1', '2', '3'], 'B': ['1,1,1,1', '2', '4'], 'C': ['2', '3', '5']}

I’ve tried

def remove_rows_with_ones(value):
    return all(x == '1' for x in value.split(','))

mask = df.apply(lambda row: any(remove_rows_with_ones(val) for val in row), axis=1)
df_filtered = df[~mask]

But it does not seem to work. Any help is apreciated.

>Solution :

You could convert to string and check if each cell contains a character other than 1 (or space/comma), if at least one True, keep the row:

out = df[df.apply(lambda s: s.astype(str).str.contains('[^1 ,]')).any(axis=1)]

Or, with your original idea of splitting the strings on ', ':

import re

out = df[~df.applymap(lambda c: all(x=='1' for x in re.split(', *', c))).all(axis=1)]

# pandas ≥ 2.1
out = df[~df.map(lambda c: all(x=='1' for x in re.split(', *', c))).all(axis=1)]

Output:

   A        B  C
1  1  1,1,1,1  2
2  2        2  3
3  3        4  5

Intermediates:

# df.apply(lambda s: s.astype(str).str.contains('[^1 ,]'))
       A      B      C
0  False  False  False
1  False  False   True
2   True   True   True
3   True   True   True
4  False  False  False

# df.map(lambda c: all(x=='1' for x in re.split(', *', c)))
       A      B      C
0   True   True   True
1   True   True  False
2  False  False  False
3  False  False  False
4   True   True   True