Selecting indices not in a subset from Pandas dataframe

July 12, 2022

I have a Python Pandas dataframe with many rows. The first column is "test_idx". For example:

df = 
test_idx sample value
1        1      1.2
1        2      -3.0
1        3      4.7
2        1      1.5
2        2      2.8

etc…

Assume I know that experiments invalid_tests = [2,3,7] are invalid. I would like to create a new Pandas dataframe cdf which contains only the valid tests.

There is a straight-forward way to do it, as I did it here:

valid_tests_idx = [] # indices of rows with valid tests
for i in range(len(df)):
    if not df["test_idx"].iloc[i] in invalid_tests:
        valid_tests_idx.append(i)
cdf = df.iloc[valid_tests_idx]

It works fine, but I ask if there is a more elegant way or an one-liner way to do it.