Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract part of listcolumn using tidyverse functions

Given the dataframe ‘dat’, where ‘author’ is a list column of author names. How can I create a new column that contains the first author’s last name only using tidyverse functions?

dat <- structure(list(author = list(c("Pagsberg, Anne Katrine", "Uhre, Camilla", 
"Uhre, Valdemar and"), c("Franklin, Martin E", "Sapyta, Jeffrey", 
"Freeman, Jennifer B"), c("Selles, Robert R", "Belschner, Laura", 
"Negreiros, Juliana and")), pmid = c("35305587", "21934055", 
"29179016")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"))

In base R, the following code works:
dat$first_author <- sapply(strsplit(sapply(dat$author, "[[", 1), ","), "[", 1)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

One pure tidyverse approach would be to group the tibble rowwise and pluck out the first element of each row in the list column before using str_remove to get rid of the first comma plus anything after it. For completeness you can ungroup at the end.

library(tidyverse)

dat %>% 
  rowwise() %>% 
  mutate(first_author = pluck(author, 1) %>% str_remove(',.*$')) %>%
  ungroup()
#> # A tibble: 3 x 3
#>   author    pmid     first_author
#>   <list>    <chr>    <chr>       
#> 1 <chr [3]> 35305587 Pagsberg    
#> 2 <chr [3]> 21934055 Franklin    
#> 3 <chr [3]> 29179016 Selles 

However, in reality I feel no compulsion to use tidyverse functions when a good one-liner base R alternative exists:

within(dat, first_author <- sapply(author, \(x) gsub(',.*$', '', x[[1]])))
#> # A tibble: 3 x 3
#>   author    pmid     first_author
#>   <list>    <chr>    <chr>       
#> 1 <chr [3]> 35305587 Pagsberg    
#> 2 <chr [3]> 21934055 Franklin    
#> 3 <chr [3]> 29179016 Selles 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading