My data has some columns with format ABC_number_AX and ABC_number_AX_MED. I would like to exclude column ABC_number_AX. To do that I define the following pattern to filter out columns with a specific pattern in their name.
# Define patterns to filter out patterns
patterns = ['_AX','_B2']
# Create a regular expression pattern to match any of the defined patterns
pattern = '|'.join(map(re.escape, patterns))
# List comprehension to filter out columns based on the pattern
filtered_columns = [col for col in df.columns if not re.search(pattern, col)]
# Create a new DataFrame with filtered columns
df= df[filtered_columns]
The problem with the above code is that it also omits column ABC_number_AX_MED. I tried inserting $ to _AX$, still the desired column is no selected. How can I fix this?
>Solution :
Assuming that the patterns to exclude must be at the end of the column names, here is a way to filter out columns that end with any of your patterns.
# List comprehension to filter out columns based on the pattern
fc = [c for c in df.columns if not any(c.endswith(p) for p in patterns)]
# Create a new DataFrame with filtered columns
df = df[fc]