Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Drop duplicates and keep first in r data.table

not familiar with R sorry for the question that I could not find already.

Suppose I have a network of IPs of data of this type:

toy_data = data.table(from=c("A","B","A","C","D","C"), to=c("B","A","C","B","A","A"))

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

from to
A B
B A
A C
C B
D A
C A

I cannot load the whole network in igraph and trying to compute statistics based on chunks. So given that the network is undirected I would like to drop all those lines that have the opposite from-to pattern (row 2, row 6).

I originally thought that something like this would work:
unique(toy_data[,.(c(from,to)|c(to,from))]) unfortunately

I thought to use two auxiliary columns:

toy_data[,orig:=paste(from,to,sep="")]
toy_data[,reverse:=paste(to,from,sep="")]

then work with something like:
unique(df[,.(?)])

but my guess is that this is way easier than what I am doing.

>Solution :

Instead of creating temporary column, paste the min by row (pmin) with the max by row (pmax) and remove the duplicates with duplicated and negate (!)

toy_data[!duplicated(paste(pmin(from, to), pmax(from, to)))]

-output

    from     to
   <char> <char>
1:      A      B
2:      A      C
3:      C      B
4:      D      A
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading