Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge rows if previous row contains a string that starts with a particular sign

I have a data frame that looks like this:

df <- as.data.frame(rbind(">A1", "aaaa", "bbb", "cccc",
            ">B2", "dddd", "eeeee","ff",
            ">C3", "ggggggg", "hhhhh", "iiiii", "jjjjj"))

This is what I want to get:

df1 <- as.data.frame(rbind(">A1", "aaaabbbcccc",
            ">B2", "ddddeeeeeff",
            ">C3", "ggggggghhhhhiiiiijjjjj"))

As you can see, I want to merge every row between two rows that contain a string starting with ">" sign.
Frankly, I don’t know where to start with this.
Please advise.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We can use cumsum(grepl(.)) for this.

data.frame(
  V1 = unlist(
    by(df$V1, cumsum(grepl("^>", df$V1)),
       function(z) c(z[1], paste(z[-1], collapse = "")))
  )
)
#                        V1
# 11                    >A1
# 12            aaaabbbcccc
# 21                    >B2
# 22            ddddeeeeeff
# 31                    >C3
# 32 ggggggghhhhhiiiiijjjjj

Brief explanation:

  • grepl(.) returns TRUE for each of the >-containing cells; then

  • cumsum assigns that row and all rows until the next occurrence the same number:

    grepl(">", df$V1)
    #  [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
    cumsum(grepl(">", df$V1))
    #  [1] 1 1 1 1 2 2 2 2 3 3 3 3 3
    
  • by(.) does something to each of those groups; in this case, it returns a vector length 2, with the >-string first and all others concatenated.

Which is structured as your df1,

df1
#                       V1
# 1                    >A1
# 2            aaaabbbcccc
# 3                    >B2
# 4            ddddeeeeeff
# 5                    >C3
# 6 ggggggghhhhhiiiiijjjjj
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading