How to remove duplicate column names in R?

April 30, 2022

I’ve got a big data frame, and like to remove the duplicate column

For simplicity, let’s pretend this is my data:

df <- data.frame(id1 = c("Aa","Aa","Ba","Ca","Da"), id2 = c(2,1,4,5,10), location=c(351,261,101,91,51), comment=c(35,26,10,9,5), comment=c(5,16,25,14,11), hight=c(15,21,5,19,18), check.names = FALSE)

I can remove the duplicate column name "comment" using:

df <- df[!duplicated(colnames(df))]

However, when I apply same code in my real dataframe it returns an error:

Error in `[.data.table`(SNV_wild, !duplicated(colnames(SNV_wild))) : 
  i evaluates to a logical vector length 1883 but there are 60483 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.

Sorry, I can’t post real data since it is quite large which you can see in error.

How can I troubleshoot this – I have gone through all columns names and there are duplicate column name.

Thank you in advance