Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace DNA nucleotide at given position in DNA sequence using for loop

In the R data frame, I am trying to replace mutation column DNA nucleotide into WT.seq using position column numbers.

Following is my data frame

    transcript  position    ref mutation    type    WT.seq
1   trx1    5   A   G   substitution    ATAAAA
2   trx2    3   C   A   substitution    CCCCCC
3   trx3    7   T   C   substitution    AAAAAATGG

Expected output in the data frame

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    transcript  position    ref mutation    type    WT.seq
1   trx1    5   A   G   substitution    ATAAGA
2   trx2    3   C   A   substitution    CCACCC
3   trx3    7   T   C   substitution    AAAAAACGG

Explanation

for example, in the given output data frame WT.seq column contains DNA sequences, and in the first row of WT.seq there is DNA sequence ATAAAA is present and I have to replace mutation column DNA nucleotide G(mutation column,1st row) at 5th position of ATAAAA, after replacing G at 5th position in this sequence it will be ATAAGA. This position number is given from the position column,1st row. I have to do this for all rows in the data frame. My data frame contains thousands of rows.

In the above output,i have done it for the first row using the following code.

DNA_seq <- read.table("sequences.txt",sep = "\t",header = T)

df<- as.data.frame(DNA_seq)

substring(df[1,6], first=df[1,2]) <- df[1,4]

I want to run for loop on the remaining rows so that all mutation nucleotide replacement will be done in WT.seq column with help of position column numbers

>Solution :

You could strsplit, replace position with mutation in Map and paste back together.

transform(dat, WT.mut=Map(replace, strsplit(WT.seq, ''), position, mutation) |>
  sapply(paste, collapse=''))
#   transcript position ref mutation         type    WT.seq    WT.mut
# 1       trx1        5   A        G substitution    ATAAAA    ATAAGA
# 2       trx2        3   C        A substitution    CCCCCC    CCACCC
# 3       trx3        7   T        C substitution AAAAAATGG AAAAAACGG

I used an extra column to demonstrate, just replace WT.mut= with WT.seq= to overwrite.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading