My dataframe contains the index column (Date) and hundreds of columns.
| Date (index) | Menu1 | Menu2 | Menu3 | Menu4 |
|---|---|---|---|---|
| 2023_1_1_x | NaN | 5 | 10 | NaN |
| 2023_1_2_x | NaN | 8 | 9 | NaN |
| 2023_1_3_x | NaN | 10 | 8 | NaN |
| 2023_1_1_y | 10 | 5 | NaN | NaN |
| 2023_1_2_y | 11 | 8 | NaN | NaN |
| 2023_1_3_y | 12 | 10 | NaN | NaN |
The Date index contains duplicates. That’s why they end with x or y.
The dataframe has hundreds of columns. I need to select the columns like Menu3 whose index contains x and the values exist for only those x indexes.
>Solution :
You can group by columns ending in _x or not, then check the NA status and ensure only the _x ones have non-NA:
out = (df.notna()
.groupby(df.index.str.endswith('_x'))
.any()
.pipe(lambda d: d.columns[d.eq(d.index, axis=0).all()])
.tolist()
)
Output: ['Menu3']