How to remove second underscore from string in R dataframe

January 24, 2022

I have a dataframe that contains a column of strings:

df1$V1 = c("5325_214424", "63325_685_2436", "573_636", "5754_23523_214235")

I want to run run a command that will check the list for values that have two delimiters and remove the second one, while preserving the others. Ideally the result will look like this:

df1$V1 = c("5325_214424", "63325_6852436", "573_636", "5754_23523214235")

I have tried using strsplit but it is returning single character letters:

df$V1 = strsplit(sub(‘(^[^]+[^]+)(.*)$’, ‘\1 \2’, df$V1), ”)

>Solution :

sub is a good start:

vec <- c("5325_214424", "63325_685_2436", "573_636", "5754_23523_214235")
sub("(_[^_]*)_", "\\1", vec)
# [1] "5325_214424"      "63325_6852436"    "573_636"          "5754_23523214235"

This can be done with strsplit, though it’s a little more complicated:

sapply(strsplit(vec, "_"), function(z) paste(z[1], paste(z[-1], collapse = ""), sep = "_"))
# [1] "5325_214424"      "63325_6852436"    "573_636"          "5754_23523214235"