df_wide <- structure(list(name = c("Crushers GC", "4Aces GC"), first_name = c("Charles",
"Peter"), last_name = c("Howell III", "Uihlein"), first_name_1 = c("Paul",
"Pat"), last_name_1 = c("Casey", "Perez"), first_name_2 = c("Bryson",
"Dustin"), last_name_2 = c("DeChambeau", "Johnson"), first_name_3 = c("Anirban",
"Patrick"), last_name_3 = c("Lahiri", "Reed")), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))
name first_name last_name first_name_1 last_name_1 first_name_2 last_name_2 first_name_3 last_name_3
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Crushers GC Charles Howell III Paul Casey Bryson DeChambeau Anirban Lahiri
2 4Aces GC Peter Uihlein Pat Perez Dustin Johnson Patrick Reed
Our goal is to pivot_longer into a dataframe with 3 columns name, first, last, the code seems like it should be simple however we are not sure what value to pass to names_pattern, we have:
zed %>%
tidyr::pivot_longer(cols = -name,
names_to = c("fname", "lname"),
names_pattern = "???")
>Solution :
Here’s the way to go:
df_wide %>%
tidyr::pivot_longer(cols = -name,
names_pattern = "(.*_name).*",
names_to = ".value")
# name first_name last_name
# 1 Crushers GC Charles Howell III
# 2 Crushers GC Paul Casey
# 3 Crushers GC Bryson DeChambeau
# 4 Crushers GC Anirban Lahiri
# 5 4Aces GC Peter Uihlein
# 6 4Aces GC Pat Perez
# 7 4Aces GC Dustin Johnson
# 8 4Aces GC Patrick Reed
Explanation:
You should define the pattern of existing columns in names_pattern. In this case, you only want one group that is either last_name or first_name. You should use regex to define those:
(.*_name).*
This will match every column that contains _name. The parenthesis are meant to capture groups. You only need the first part of the column names (so, not the _1, _2) in your expected output, so we use parenthesis only for one group. This is why there is only one names_to element.
In names_to, where you define new column names, .value refers to:
".value" indicates that the corresponding component of the column name
defines the name of the output column containing the cell values,
overriding values_to entirely.
Here, .value refers to whatever matches (.*_name), so both first_name and last_name. And that is why we end up with two columns. If there were middle_name, middle_name_1, … columns in the original data.frame, we would end up with a third middle_name column.