I’m having difficulties implementing a solution for this question provided by users on many similar questions like this. See sample df below.
structure(list(FirstName = c("Albus Percival Wulfric Brian Dumbledore",
"Harry James Potter", "Tom Marvollo Riddle", "Lord Voldemort"
), Email = c("albusD@hogwarts.com", "harryP@hogwarts.com", "tomR@hogwarts.com",
"LV@Wiz.com"), ClassSection = c("HeadMaster", "Student", "Dark Lord in training",
"Dark Lord")), row.names = c(NA, -4L), spec = structure(list(
cols = list(FirstName = structure(list(), class = c("collector_character",
"collector")), Email = structure(list(), class = c("collector_character",
"collector")), ClassSection = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
I want to create a new column, where the first and last names are united. For this,
I first tried separate(FirstName, sep = " ", into("First", "Middle", Last"). However, what happens is that there are other word elements that get missed. So, I’m not able to effectively combine them together.
Next, I tried, df%>% mutate(First = str_split(FirstName, pattern = " ")). This gives a list of elements. I want a way to extract the first and the last element from this column.
# A tibble: 4 x 4
FirstName Email ClassSection First
<chr> <chr> <chr> <list>
1 Albus Percival Wulfric Brian Dumbledore albusD@hogwarts.com HeadMaster <chr [4]>
2 Harry James Potter harryP@hogwarts.com Student <chr [3]>
3 Tom Marvollo Riddle tomR@hogwarts.com Dark Lord in training <chr [3]>
4 Lord Voldemort LV@Wiz.com Dark Lord <chr [2]>
I looked at various answers where tail(First, n=1) and dplyr’s last(First) was suggested. However, these don’t give me the right answer. I also tried unnest_wider(First) but it has the same problem as separate(firstName). That is, I see multiple columns. Now these don’t work for names that are just two or more than 3 words.
I’m looking to continue the dplyr (tidyverse’s) workflow. Is there a way I can get the first and last vector to combine together into a new column?
>Solution :
Do you mean something like this?
df %>%
mutate(
FirstLast = sapply(str_split(FirstName, pattern = " "),
\(z) paste(z[unique(c(1, length(z)))], collapse = ""))
)
# # A tibble: 4 × 4
# FirstName Email ClassSection FirstLast
# <chr> <chr> <chr> <chr>
# 1 Albus Percival Wulfric Brian Dumbledore albusD@hogwarts.com HeadMaster AlbusDumbledore
# 2 Harry James Potter harryP@hogwarts.com Student HarryPotter
# 3 Tom Marvollo Riddle tomR@hogwarts.com Dark Lord in training TomRiddle
# 4 Lord Voldemort LV@Wiz.com Dark Lord LordVoldemort
or much more simply
df %>%
mutate(FirstLast = sub(" .* ", "", FirstName))
# # A tibble: 4 × 4
# FirstName Email ClassSection FirstLast
# <chr> <chr> <chr> <chr>
# 1 Albus Percival Wulfric Brian Dumbledore albusD@hogwarts.com HeadMaster AlbusDumbledore
# 2 Harry James Potter harryP@hogwarts.com Student HarryPotter
# 3 Tom Marvollo Riddle tomR@hogwarts.com Dark Lord in training TomRiddle
# 4 Lord Voldemort LV@Wiz.com Dark Lord Lord Voldemort