I would like to keep columns that contains word "FAIL".
Input data:
| Values1 | Values2 | Values3 | Status1 | Status2 | Status3 |
|---|---|---|---|---|---|
| 1 | 1 | 1 | PASS | PASS | FAIL |
| 2 | 2 | 2 | PASS | PASS | PASS |
| 3 | 3 | 3 | PASS | PASS | PASS |
| 4 | 4 | 4 | PASS | FAIL | PASS |
Expected output:
| Status2 | Status3 |
|---|---|
| PASS | FAIL |
| FAIL | PASS |
Current Output:
| Status1 | Status2 | Status3 |
|---|---|---|
| PASS | PASS | FAIL |
| PASS | FAIL | PASS |
My code:
import pandas as pd
values = range(1,5)
status_pass = ["PASS"]*len(values)
status1 = status_pass[1:]+["FAIL"]
status2 = status1[::-1]
df = pd.DataFrame({"Values1":values,"Values2":values,"Values3":values,"Status1":status_pass,"Status2":status1,"Status3":status2})
# drop unwanted rows
words_to_keep = ["FAIL"]
df = df[df.stack().groupby(level=0).apply(
lambda x: all(x.str.contains(w, case=False).any() for w in words_to_keep))]
# Filter by column name
df = df.filter(like='Status', axis=1)
>Solution :
Use DataFrame.loc for filter rows and columns by mask, if need filter by list use DataFrame.isin, if need filter scalar use DataFrame.eq with DataFrame.any for test for at least one match:
words_to_keep = ["FAIL"]
m = df.isin(words_to_keep)
#m = df.eq("FAIL")
out = df.loc[m.any(axis=1), m.any()]
print (out)
Status2 Status3
0 PASS FAIL
3 FAIL PASS
Details:
print (df.isin(words_to_keep))
Values1 Values2 Values3 Status1 Status2 Status3
0 False False False False False True
1 False False False False False False
2 False False False False False False
3 False False False False True False
print (m.any(axis=1))
0 True
1 False
2 False
3 True
print (m.any())
Values1 False
Values2 False
Values3 False
Status1 False
Status2 True
Status3 True
dtype: bool