I have two dataframes that look like this:
> dput(df)
structure(list(first_column = c("value_1", "value_2"), second_column = c("value_1",
"value_2")), class = "data.frame", row.names = c(NA, -2L))
> dput(df_new)
structure(list(first_column = c("value_1", "value_2"), second_column = c("value_1",
"value_2"), third_column = c("value_1", "value_2")), class = "data.frame", row.names = c(NA,
-2L))
I would like to match the df_new
dataframe to have the same columns as df
(so essentially, just deleting ‘third_column’.
But since I am working with multiple different dataframes, code like this won’t work.
df_new <- df_new[,c(-3)]
Is it possible to match the column names from the two datasets without indexing column 3?
>Solution :
I think we just need to intersect
the names from df
with the names of df_new
; using intersect
means we won’t accidentally try to retrieve non-existing names.
df_new[, intersect(names(df), names(df_new)), drop=FALSE]
# first_column second_column
# 1 value_1 value_1
# 2 value_2 value_2
The drop=FALSE
is because base R’s behavior when the column-selection reduces to a single column is to return a vector instead of a data.frame
. While not applicable with this sample data, if there were only one column name in common, it would not return a frame. We can fake it by introducing [1]
to the above, so compare
df_new[, intersect(names(df), names(df_new))[1]]
# [1] "value_1" "value_2"
df_new[, intersect(names(df), names(df_new))[1], drop=FALSE]
# first_column
# 1 value_1
# 2 value_2
This is just being defensive.