Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create ID variable per chain of values

I have a dataset that looks like this:

data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
                   Name2 = c("B", "C", "E", "G", "I"))

I would like to add an ID column to help me trace groups of names, i.e. who references who? So with the example data, the groups would be:

  Name1 Name2 GroupID
      A     B       1
      B     C       1
      D     E       2
      E     G       2
      H     I       3

Please note that my original data is not ordered as this example is. Thanks in advance for any help!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use the igraph package to make a network from your data set and determine clusters:

data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
                   Name2 = c("B", "C", "E", "G", "I"))


library(igraph)
graph <- graph_from_data_frame(data, directed = FALSE)
clusters <- components(graph)

#data$GroupId <- sapply(data$Name1, function(x) clusters$membership[which(names(clusters$membership) == x)])
# Simpler version
data$GroupId <- clusters$membership[data$Name1]

That gives:

> data
  Name1 Name2 GroupId
1     A     B       1
2     B     C       1
3     D     E       2
4     E     G       2
5     H     I       3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading