I am trying to parse the following strings, where I want to copy the accession number in the beginning of the string to add after the "];" and before the "dxPhospho", please note that the number before the "x" can be any number and why I am calling it "d" here. This pattern of "]; dPhospho" is what I need to match.
# Sample dataframe
DT <- data.frame(Positions.in.Master.Proteins = c("Q8R149 2xPhospho [T131(100); T/S]; 2xPhospho [T157(100); T/S]",
"Q9UET0 3xPhospho [S23(90); T63(70); Y67(70)]; 3xPhospho]"))
The output would look like this;
[1] "Q8R149 2xPhospho [T131(100); T/S]; **Q8R149** 2xPhospho [T157(100); T/S]"
[2] "Q9UET0 3xPhospho [S23(90); T63(70); Y67(70)]; **Q9UET0** 3xPhospho]"
where you can now see that the accession numbers are copied to where I need them to be. Thanks!
>Solution :
With the package gsubfn, you can extract your accession number with sub, and treat it as the replacement directly.
library(gsubfn)
unname(
sapply(DT$Positions.in.Master.Proteins,
\(i) gsubfn(pattern = "; \\dxPhospho",
replacement = \(x) paste0("; ", sub(" \\[.*", "", i)),
x = i))
)
[1] "Q8R149 2xPhospho [T131(100); T/S]; Q8R149 2xPhospho [T157(100); T/S]"
[2] "Q9UET0 3xPhospho [S23(90); T63(70); Y67(70)]; Q9UET0 3xPhospho]"