Best way to move an unexpected column in a Pandas DF to a new DF?

January 10, 2022

Wondering what the best way to tackle this issue is. If I have a DF with the following columns

df1()
type_of_fruit   name_of_fruit    price
.....           .....            .....

and a list called

expected_cols = ['name_of_fruit','price']

whats the best way to automate the check of df1 against the expected_cols list? I was trying something like

df_cols=df1.columns.values.tolist()
if df_cols != expected_cols:

And then try to drop to another df any columns not in expected_cols, but this doesn’t seem like a great idea to me. Is there a way to save the "dropped" columns?

df2 = df1.drop(columns=expected_cols)

But then this seems problematic depending on column ordering, and also in cases where the columns could have either more values than expected, or less values than expected. In cases where there are less values than expected (ie the df1 only contains the column name_of_fruit) I’m planning on using

df1.reindex(columns=expected_cols)

But a bit iffy on how to do this programatically, and then how to handle the issue where there are more columns than expected.

>Solution :

You can use set difference using -:

Assuming df1 having cols:

In [542]: df1_cols = df1.columns # ['type_of_fruit', 'name_of_fruit', 'price']
In [539]: expected_cols = ['name_of_fruit','price']

In [541]: unwanted_cols = list(set(d1_cols) - set(expected_cols))

In [542]: df2 = df1[unwanted_cols]
In [543]: df1.drop(unwanted_cols, 1, inplace=True)