how to split a data into as many files as possible

byMR

January 12, 2023

I have a data full of strings like this

df<- "PFSSQQRPHRHSMYVTRDKVRAKGLDGSLSIGQGMAARANSLQLLSPQPGEQLPPEMTVA"

I want to split the letters 5 counts before S and 5 letters after each S

so the output looks like this

5 count before S    5 counts after S
   PF               SQQRP
  PFS               QRPHR
RPHRH               MYVTR
KGLDG               LSIGQ
LDGSL               IGQGM
AARAN               LQLLS
SLQLL               PQPGE

>Solution :

Try this:

fun <- function(S, bef=5, aft=bef) {
  wh <- which(strsplit(S, "")[[1]] == "S")
  Sbef <- substring(S, wh - bef, wh - 1)
  Saft <- substring(S, wh + 1, wh + aft)
  data.frame(bef = Sbef, aft = Saft)
}
fun(df)
#     bef   aft
# 1    PF SQQRP
# 2   PFS QQRPH
# 3 RPHRH MYVTR
# 4 KGLDG LSIGQ
# 5 LDGSL IGQGM
# 6 AARAN LQLLS
# 7 SLQLL PQPGE

Note that strings without any instance of "S" will return 0 rows. If you instead want it to return the whole string as bef (and empty string in aft), we need a simple conditional:

fun <- function(S, bef=5, aft=bef) {
  wh <- which(strsplit(S, "")[[1]] == "S")
  if (!length(wh)) wh <- nchar(S) + 1
  Sbef <- substring(S, wh - bef, wh - 1)
  Saft <- substring(S, wh + 1, wh + aft)
  data.frame(bef = Sbef, aft = Saft)
}

fun("hello world")
#     bef aft
# 1 world

Edit: thanks for @DarrenTsai’s comment, we can use substring in a vectorized fashion, removing the need for mapply.