Hi I have a data frame like this, with two columns (A and B):
A B
x_1234 rs4566
x_1567 rs3566
z_1444 rs78654
r_1234 rs34567
I would like to change each letter in front of the numbers in column A after the number, also with a underscore.
Expected output:
A B
1234_x rs4566
1567_x rs3566
1444_z rs78654
1234_r rs34567
I tried something like, but it doesn’t work:
DF$A <- gsub(".*_", "_*.", DF$A)
>Solution :
We may need to switch the characters after capturing as a group ((.*)– captures characters before the _ and the second capture group as one or more digits (\\d+), then switch those in the replacement with the backreferences (\\2 followed by \\1 separated by a _)
DF$A <- sub("(.*)_(\\d+)", "\\2_\\1", DF$A)
-output
> DF
A B
1 1234_x rs4566
2 1567_x rs3566
3 1444_z rs78654
4 1234_r rs34567
The OP’s code matches any characters (.*) followed by the _ and replace with the _ and literal characters (*.). Instead, the replacement should be based on the capture group backreferences
data
DF <- structure(list(A = c("x_1234", "x_1567", "z_1444", "r_1234"),
B = c("rs4566", "rs3566", "rs78654", "rs34567")),
class = "data.frame", row.names = c(NA,
-4L))