Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge rows when same word appears in another column

I want to merge rows together that have the same word in another column. The solution should be in R Base. The table entries are all comma separated strings (characters), not lists. So as shown below the color shades for the same color should be added in a string in one row, instead of spanning over several rows. Also there should be no duplicates in the color shades column.

I have already tried with:

   aggregate(df["Color shades"], df["Color"], paste, collapse=", ")

As well as with:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   aggregate(Color shades ~ Color ,df ,toString)

But that didn’t lead to the desired result.

Dataframe:

    df <- data.frame(colorshades = c("turquoise, babyblue", "royal blue, true blue", 
                         "navy blue, true blue"), colors = c("blue", "blue", "blue"))

Currently:

Color shades Color
turquoise, babyblue blue
royal blue, true blue blue
navy blue, true blue blue

Desired Output:

Color shades Color
turquoise, babyblue, royal blue, true blue, navy blue blue

>Solution :

  1. Convert "Color shades" to a list-column:

    lapply(strsplit(df[["Color shades"]], ","), trimws)
    # [[1]]
    # [1] "turquoise" "babyblue" 
    # [[2]]
    # [1] "royal blue" "true blue" 
    # [[3]]
    # [1] "navy blue" "true blue"
    df[["Color shades"]] <- lapply(strsplit(df[["Color shades"]], ","), trimws)
    df
    #            Color shades Color
    # 1   turquoise, babyblue  blue
    # 2 royal blue, true blue  blue
    # 3  navy blue, true blue  blue
    
  2. Aggregate with unique:

    aggregate(df["Color shades"], df["Color"], function(z) paste(unique(unlist(z)), collapse=", "))
    #   Color                                          Color shades
    # 1  blue turquoise, babyblue, royal blue, true blue, navy blue
    

    or, keeping with the list-column method,

    aggregate(df["Color shades"], df["Color"], function(z) list(unique(unlist(z))))
    #   Color                                          Color shades
    # 1  blue turquoise, babyblue, royal blue, true blue, navy blue
    str(aggregate(df["Color shades"], df["Color"], function(z) list(unique(unlist(z)))))
    # 'data.frame': 1 obs. of  2 variables:
    #  $ Color       : chr "blue"
    #  $ Color shades:List of 1
    #   ..$ : chr  "turquoise" "babyblue" "royal blue" "true blue" ...
    

There are often (but not always) advantages to dealing with list-columns vice comma-separated values. If your use-case is such that you frequently want to look at individual elements within one of these fields, you’ll find yourself dealing deeply with regexes and/or repeatedly using strsplit to separate. With list-columns, one can use tools like unique and %in% with abandon (though admittedly one should really become more comfortable with lapply/sapply, and many base-R tools for aggregation do not always work consistently with list-columns).


Data

df <- structure(list(`Color shades` = c("turquoise, babyblue", "royal blue, true blue", "navy blue, true blue"), Color = c("blue", "blue", "blue")), class = "data.frame", row.names = c(NA, -3L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading