Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

concatenate unique strings of variable in data.table by repeated values of another variable in data.table

I have a data.table that has one column, we’ll call it country with repeated values, and another column (survey) that has unique strings for each observation. I want to create a new variable that has the pasted value of all the survey strings for an individual country. I’ve done it with an index and a for loop but I’m curious if there’s a quicker way to do this with some data.table syntax. MWE:

library(data.table)

dt <- data.table(country = c("Belgium","Belgium","Bolivia","Brazil","Brazil","Brazil"),
                      survey = c("BE01, BE03, BE04","BE05, BE07, BE11",
                            "BO11, BO13", "BR01, BR02, BR03", "BR05","BR12, BR13"))
dt[, index := seq(1, .N), by = "country"]
for(cntry in unique(dt$country)){
  tmp_ind <- max(dt[country == eval(cntry)]$index)
  dt[country == eval(cntry) & index ==1, all_surveys := survey]
  if(tmp_ind > 1) {
    for(i in 2:tmp_ind){
      dt[country == eval(cntry) & index ==i,
         all_surveys := paste0(dt[country == eval(cntry) & index == i-1, all_surveys], survey)]
    }
  }
}

Which gives the desired

> dt
   country           survey index                      all_surveys
1: Belgium BE01, BE03, BE04     1                 BE01, BE03, BE04
2: Belgium BE05, BE07, BE11     2 BE01, BE03, BE04BE05, BE07, BE11
3: Bolivia       BO11, BO13     1                       BO11, BO13
4:  Brazil BR01, BR02, BR03     1                 BR01, BR02, BR03
5:  Brazil             BR05     2             BR01, BR02, BR03BR05
6:  Brazil       BR12, BR13     3   BR01, BR02, BR03BR05BR12, BR13

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Solved without looping

library(data.table)

# Create the data.table
dt <- data.table(country = c("Belgium", "Belgium", "Bolivia", "Brazil", "Brazil", "Brazil"),
                 survey = c("BE01, BE03, BE04", "BE05, BE07, BE11", "BO11, BO13", 
                            "BR01, BR02, BR03", "BR05", "BR12, BR13"))

# Create index
dt[, index := seq_len(.N), by = "country"]

# Create all_surveys
dt[, all_surveys := Reduce(function(prev, curr) paste0(prev, curr), 
                           x = survey, 
                           accumulate = TRUE), 
   by = "country"]

print(dt)

> print(dt)
   country           survey index                      all_surveys
1: Belgium BE01, BE03, BE04     1                 BE01, BE03, BE04
2: Belgium BE05, BE07, BE11     2 BE01, BE03, BE04BE05, BE07, BE11
3: Bolivia       BO11, BO13     1                       BO11, BO13
4:  Brazil BR01, BR02, BR03     1                 BR01, BR02, BR03
5:  Brazil             BR05     2             BR01, BR02, BR03BR05
6:  Brazil       BR12, BR13     3   BR01, BR02, BR03BR05BR12, BR13
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading