Given the dataframe ‘dat’, where ‘author’ is a list column of author names. How can I create a new column that contains the first author’s last name only using tidyverse functions?
dat <- structure(list(author = list(c("Pagsberg, Anne Katrine", "Uhre, Camilla",
"Uhre, Valdemar and"), c("Franklin, Martin E", "Sapyta, Jeffrey",
"Freeman, Jennifer B"), c("Selles, Robert R", "Belschner, Laura",
"Negreiros, Juliana and")), pmid = c("35305587", "21934055",
"29179016")), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
In base R, the following code works:
dat$first_author <- sapply(strsplit(sapply(dat$author, "[[", 1), ","), "[", 1)
>Solution :
One pure tidyverse approach would be to group the tibble rowwise and pluck out the first element of each row in the list column before using str_remove to get rid of the first comma plus anything after it. For completeness you can ungroup at the end.
library(tidyverse)
dat %>%
rowwise() %>%
mutate(first_author = pluck(author, 1) %>% str_remove(',.*$')) %>%
ungroup()
#> # A tibble: 3 x 3
#> author pmid first_author
#> <list> <chr> <chr>
#> 1 <chr [3]> 35305587 Pagsberg
#> 2 <chr [3]> 21934055 Franklin
#> 3 <chr [3]> 29179016 Selles
However, in reality I feel no compulsion to use tidyverse functions when a good one-liner base R alternative exists:
within(dat, first_author <- sapply(author, \(x) gsub(',.*$', '', x[[1]])))
#> # A tibble: 3 x 3
#> author pmid first_author
#> <list> <chr> <chr>
#> 1 <chr [3]> 35305587 Pagsberg
#> 2 <chr [3]> 21934055 Franklin
#> 3 <chr [3]> 29179016 Selles