I am trying to remove rows whose all values are the same, or a combination of the same values.
I have, for example a dataframe like
data = {'A': ['1, 1, 1', '1', '2', '3', '1'], 'B': ['1', '1,1,1,1', '2', '4', '1'], 'C': ['1, 1', '2', '3', '5', '1']}
I want to remove rows whose values in all columns are ‘1’ or any combination of ‘1’.
The final result should be something like:
data = {'A': ['1', '2', '3'], 'B': ['1,1,1,1', '2', '4'], 'C': ['2', '3', '5']}
I’ve tried
def remove_rows_with_ones(value):
return all(x == '1' for x in value.split(','))
mask = df.apply(lambda row: any(remove_rows_with_ones(val) for val in row), axis=1)
df_filtered = df[~mask]
But it does not seem to work. Any help is apreciated.
>Solution :
You could convert to string and check if each cell contains a character other than 1 (or space/comma), if at least one True, keep the row:
out = df[df.apply(lambda s: s.astype(str).str.contains('[^1 ,]')).any(axis=1)]
Or, with your original idea of splitting the strings on ', ':
import re
out = df[~df.applymap(lambda c: all(x=='1' for x in re.split(', *', c))).all(axis=1)]
# pandas ≥ 2.1
out = df[~df.map(lambda c: all(x=='1' for x in re.split(', *', c))).all(axis=1)]
Output:
A B C
1 1 1,1,1,1 2
2 2 2 3
3 3 4 5
Intermediates:
# df.apply(lambda s: s.astype(str).str.contains('[^1 ,]'))
A B C
0 False False False
1 False False True
2 True True True
3 True True True
4 False False False
# df.map(lambda c: all(x=='1' for x in re.split(', *', c)))
A B C
0 True True True
1 True True False
2 False False False
3 False False False
4 True True True