not familiar with R sorry for the question that I could not find already.
Suppose I have a network of IPs of data of this type:
toy_data = data.table(from=c("A","B","A","C","D","C"), to=c("B","A","C","B","A","A"))
| from | to |
|---|---|
| A | B |
| B | A |
| A | C |
| C | B |
| D | A |
| C | A |
I cannot load the whole network in igraph and trying to compute statistics based on chunks. So given that the network is undirected I would like to drop all those lines that have the opposite from-to pattern (row 2, row 6).
I originally thought that something like this would work:
unique(toy_data[,.(c(from,to)|c(to,from))]) unfortunately
I thought to use two auxiliary columns:
toy_data[,orig:=paste(from,to,sep="")]
toy_data[,reverse:=paste(to,from,sep="")]
then work with something like:
unique(df[,.(?)])
but my guess is that this is way easier than what I am doing.
>Solution :
Instead of creating temporary column, paste the min by row (pmin) with the max by row (pmax) and remove the duplicates with duplicated and negate (!)
toy_data[!duplicated(paste(pmin(from, to), pmax(from, to)))]
-output
from to
<char> <char>
1: A B
2: A C
3: C B
4: D A