Suppose I have a data table:
dt <- data.table(x = c("a,b,b,c,NA","b,b,NA,c","a,b,c,c,c,d"))
dt
x
1: a,b,b,c,NA
2: b,b,NA,c
3: a,b,c,c,c,d
Now for each row I would like to split the x column, extract all unique characters and paste them together so I get:
x
1: a,b,c,NA
2: b,NA,c
3: a,b,c,d
I have tried this so far but after that I am stuck and another issue is that I have datasets with over two million observations.
dt[, tstrsplit(x, ",")]
V1 V2 V3 V4 V5 V6
1: a b b c NA <NA>
2: b b NA c <NA> <NA>
3: a b c c c d
>Solution :
dt[, x2 := lapply(strsplit(x, ","), function(y) paste0(unique(y), collapse = ","))]
# x x2
# 1: a,b,b,c,NA a,b,c,NA
# 2: b,b,NA,c b,NA,c
# 3: a,b,c,c,c,d a,b,c,d