Select columns containing a certain number of values

February 12, 2023

My dataframe contains the index column (Date) and hundreds of columns.

Date (index)	Menu1	Menu2	Menu3	Menu4
2023_1_1_x	NaN	5	10	NaN
2023_1_2_x	NaN	8	9	NaN
2023_1_3_x	NaN	10	8	NaN
2023_1_1_y	10	5	NaN	NaN
2023_1_2_y	11	8	NaN	NaN
2023_1_3_y	12	10	NaN	NaN

The Date index contains duplicates. That’s why they end with x or y.

The dataframe has hundreds of columns. I need to select the columns like Menu3 whose index contains x and the values exist for only those x indexes.

>Solution :

You can group by columns ending in _x or not, then check the NA status and ensure only the _x ones have non-NA:

out = (df.notna()
         .groupby(df.index.str.endswith('_x'))
         .any()
         .pipe(lambda d: d.columns[d.eq(d.index, axis=0).all()])
         .tolist()
       )

Output: ['Menu3']