Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Change underscore behind word within column in R

Hi I have a data frame like this, with two columns (A and B):

 A       B
x_1234 rs4566
x_1567 rs3566
z_1444 rs78654
r_1234 rs34567

I would like to change each letter in front of the numbers in column A after the number, also with a underscore.

Expected output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 A       B
1234_x rs4566
1567_x rs3566
1444_z rs78654
1234_r rs34567

I tried something like, but it doesn’t work:

DF$A <- gsub(".*_", "_*.", DF$A)

>Solution :

We may need to switch the characters after capturing as a group ((.*)– captures characters before the _ and the second capture group as one or more digits (\\d+), then switch those in the replacement with the backreferences (\\2 followed by \\1 separated by a _)

DF$A <- sub("(.*)_(\\d+)", "\\2_\\1", DF$A)

-output

> DF
       A       B
1 1234_x  rs4566
2 1567_x  rs3566
3 1444_z rs78654
4 1234_r rs34567

The OP’s code matches any characters (.*) followed by the _ and replace with the _ and literal characters (*.). Instead, the replacement should be based on the capture group backreferences

data

DF <- structure(list(A = c("x_1234", "x_1567", "z_1444", "r_1234"), 
    B = c("rs4566", "rs3566", "rs78654", "rs34567")),
 class = "data.frame", row.names = c(NA, 
-4L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading