I’m interested in eliminating duplicates, regardless of their order, and conditionally to one variable. To be more specific, I aim to retain the dyad where the one on the right consistently possesses the lowest longitude.
Here an example of the data I have:
data<-data.frame(dyad.1=c("A","B","A","C","B","C"),dyad.2=c("B","A","C","A","C","B"), long_dyad.1=c(1.3,2,1.3,0.3,2,1.3), long_dyad.2=c(2,1.3,0.3,1.3,1.3,2))
Here an example of what I would like:
data<-data.frame(dyad.1=c("A","C","C"),dyad.2=c("B","A","B"), long_dyad.1=c(1.3,0.3,1.3), long_dyad.2=c(2,1.3,2))
I know that one possible code is :
library(dplyr)
library(tidyr)
library(tidyverse)
library(haven)
data<-data%>%arrange(long_dyad.1)
udata <- data[!duplicated(apply(data[,1:2], 1, function(row) paste(sort(row),collapse=""))),]
Nevertheless, I’m curious if there exists a more efficient code for this purpose.
>Solution :
data |> dplyr::filter(long_dyad.1 < long_dyad.2)
I could add an explanation, but it’s fairly self-evident what this does.
Or with base R:
data[data$long_dyad.1 < data$long_dyad.2,]