I have a pandas dataframe and I want to filter/select conditions based on elements of an input list. So, for example, I have something like:
filters = ['category', 'name']
# I am just trying to select the columns which would match as follows:
data = {'category_name_courses': ["Spark","PySpark","Python","pandas"], 'category_name_area': ["cloud", "cloud", "prog", "ds"], 'some_other_column': [0, 0, 0, 0]
x = pd.DataFrame(data)
selections = list()
for col in x.columns:
if ('name' in col) and ('category' in col):
selections.append(col)
In my case, this if condition or some other way of selection should be built by ‘ANDing’ everything from this input list
>Solution :
Your edit shows that you want to filter columns based on their name.
Simply use:
filters = ['category', 'name']
for col in x.columns:
if all(x in col for x in filters):
print(col)
Output:
category_name_courses
category_name_area
older answer: filtering values
You can do almost what you suggested:
x = pd.DataFrame([['flow', 'x', 'category'],['x','x','flow']])
for col in x.columns:
if ('flow' in x[col].values) and ('category' in x[col].values):
# Do something with this column...
print(f'column "{col}" matches')
Using a list of matches:
filters = ['category', 'flow']
for col in x.columns:
if all(x in x[col].values for x in filters):
# Do something with this column...
print(f'column "{col}" matches')
Or, more efficiently, using a set:
filters = set(['category', 'flow'])
for col in x.columns:
if set(x[col]) >= filters:
# Do something with this column...
print(f'column "{col}" matches')
Example:
column "2" matches