I have a dataframe that contains a column of strings:
df1$V1 = c("5325_214424", "63325_685_2436", "573_636", "5754_23523_214235")
I want to run run a command that will check the list for values that have two delimiters and remove the second one, while preserving the others. Ideally the result will look like this:
df1$V1 = c("5325_214424", "63325_6852436", "573_636", "5754_23523214235")
I have tried using strsplit but it is returning single character letters:
df$V1 = strsplit(sub(‘(^[^]+[^]+)(.*)$’, ‘\1 \2’, df$V1), ”)
>Solution :
sub
is a good start:
vec <- c("5325_214424", "63325_685_2436", "573_636", "5754_23523_214235")
sub("(_[^_]*)_", "\\1", vec)
# [1] "5325_214424" "63325_6852436" "573_636" "5754_23523214235"
This can be done with strsplit
, though it’s a little more complicated:
sapply(strsplit(vec, "_"), function(z) paste(z[1], paste(z[-1], collapse = ""), sep = "_"))
# [1] "5325_214424" "63325_6852436" "573_636" "5754_23523214235"