Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Making list of strings while summarizing with dplyr

I have a series of dataframes, each of which contains a name column and then a text column. I’d like to find duplicates in the text, and then generate a list of all the names that are associated with the duplicate. I can get as far as getting a list of the text duplicates and the number of times each duplicate occurs, but I’m struggling to find a way to get the list of associated names. Here is a reproducible example:

#two separate data frames with name/string
books1 <- data.frame(
  name=rep("Ellie", 4),
  book= c("Anne of Green Gables", "The Secret Garden", "Alice in Wonderland", "A Little Princess"))

books2 <- data.frame(
  name=rep('Jess', 6),
  book=c("Harry Potter", "Percy Jackson", "Anne of Green Gables", "Chronicles of Narnia", "Redwall", "A Little Princess"))

#combine into single data frame
books <- bind_rows(books1, books2)

#identify repeats
repeatbooks <- books %>% group_by(book) %>% summarize(n=n())

This gives me:

  book                     n
1 A Little Princess        2
2 Alice in Wonderland      1
3 Anne of Green Gables     2
4 Chronicles of Narnia     1
5 Harry Potter             1
6 Percy Jackson            1
7 Redwall                  1
8 The Secret Garden        1

What I’d like is something like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  book                     n     name
1 A Little Princess        2     Ellie, Jess
2 Alice in Wonderland      1     Ellie
3 Anne of Green Gables     2     Ellie, Jess

I’d hoped to do something like this, but it creates multiple rows, rather than grouping the names into a single row

#identify repeats while catching associated names - doesn't group into single column
repeatbooks <- books %>% group_by(book) %>% summarize(n=n(), names=c(paste0(name), ', '))

>Solution :

Do you mean something like below

books %>%
  reframe(
    n = n(),
    name = toString(unique(name)),
    .by = book
  )

such that

                  book n        name
1 Anne of Green Gables 2 Ellie, Jess
2    The Secret Garden 1       Ellie
3  Alice in Wonderland 1       Ellie
4    A Little Princess 2 Ellie, Jess
5         Harry Potter 1        Jess
6        Percy Jackson 1        Jess
7 Chronicles of Narnia 1        Jess
8              Redwall 1        Jess
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading