I have a data full of strings like this
df<- "PFSSQQRPHRHSMYVTRDKVRAKGLDGSLSIGQGMAARANSLQLLSPQPGEQLPPEMTVA"
I want to split the letters 5 counts before S and 5 letters after each S
so the output looks like this
5 count before S 5 counts after S
PF SQQRP
PFS QRPHR
RPHRH MYVTR
KGLDG LSIGQ
LDGSL IGQGM
AARAN LQLLS
SLQLL PQPGE
>Solution :
Try this:
fun <- function(S, bef=5, aft=bef) {
wh <- which(strsplit(S, "")[[1]] == "S")
Sbef <- substring(S, wh - bef, wh - 1)
Saft <- substring(S, wh + 1, wh + aft)
data.frame(bef = Sbef, aft = Saft)
}
fun(df)
# bef aft
# 1 PF SQQRP
# 2 PFS QQRPH
# 3 RPHRH MYVTR
# 4 KGLDG LSIGQ
# 5 LDGSL IGQGM
# 6 AARAN LQLLS
# 7 SLQLL PQPGE
Note that strings without any instance of "S" will return 0 rows. If you instead want it to return the whole string as bef (and empty string in aft), we need a simple conditional:
fun <- function(S, bef=5, aft=bef) {
wh <- which(strsplit(S, "")[[1]] == "S")
if (!length(wh)) wh <- nchar(S) + 1
Sbef <- substring(S, wh - bef, wh - 1)
Saft <- substring(S, wh + 1, wh + aft)
data.frame(bef = Sbef, aft = Saft)
}
fun("hello world")
# bef aft
# 1 world
Edit: thanks for @DarrenTsai’s comment, we can use substring in a vectorized fashion, removing the need for mapply.