I have two dataframes and I am trying to drop rows whose name is not found in the column names of other dataframe.
For example:
DF1
| ahmed | emad | ali | |
|---|---|---|---|
| —- | —- | —- | —- |
| —- | —- | —- | —- |
| —- | —- | —- | —- |
DF2
| names | |||
|---|---|---|---|
| emad | — | — | |
| ahmed | — | — | |
| ibrahim | — | — | |
| saad | — | — | — |
| hassan | — | — | — |
I am trying to drop the DF1 columns whose names aren’t in the names of DF2.
My code so far
library(dplyr)
`%notin%` <- Negate(`%in%`)
for ( i in seq_along(colnames(DF1))){
if (colnames(DF1)[i] %notin% rownames(DF2){
DF1=select(DF1,-i)
}
}
It gets the job done, however it raises this error:
Error: Can’t subset columns that don’t exist.
and if run the code again it drops "ahmed" and "emad" column even they exist in the DF2!!
>Solution :
The issue with your loop is that you set up looping to go across all columns, but you delete columns as you go. By the time you get to column 20, you have deleted several columns and there is no longer a column 20!
But you don’t need a loop at all for this.
cols_to_keep = intersect(colnames(DF1), rownames(DF2))
DF1 = DF1[cols_to_keep]