Delete columns in R that do not match another dataframe

I have two dataframes that look like this:

> dput(df)
structure(list(first_column = c("value_1", "value_2"), second_column = c("value_1", 
"value_2")), class = "data.frame", row.names = c(NA, -2L))

> dput(df_new)
structure(list(first_column = c("value_1", "value_2"), second_column = c("value_1", 
"value_2"), third_column = c("value_1", "value_2")), class = "data.frame", row.names = c(NA, 
-2L))

I would like to match the df_new dataframe to have the same columns as df (so essentially, just deleting ‘third_column’.

But since I am working with multiple different dataframes, code like this won’t work.

 df_new <- df_new[,c(-3)]

Is it possible to match the column names from the two datasets without indexing column 3?

>Solution :

I think we just need to intersect the names from df with the names of df_new; using intersect means we won’t accidentally try to retrieve non-existing names.

df_new[, intersect(names(df), names(df_new)), drop=FALSE]
#   first_column second_column
# 1      value_1       value_1
# 2      value_2       value_2

The drop=FALSE is because base R’s behavior when the column-selection reduces to a single column is to return a vector instead of a data.frame. While not applicable with this sample data, if there were only one column name in common, it would not return a frame. We can fake it by introducing [1] to the above, so compare

df_new[, intersect(names(df), names(df_new))[1]]
# [1] "value_1" "value_2"
df_new[, intersect(names(df), names(df_new))[1], drop=FALSE]
#   first_column
# 1      value_1
# 2      value_2

This is just being defensive.

Leave a Reply