Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pivot_longer across multiple columns, how to use names_pattern parameter

df_wide <- structure(list(name = c("Crushers GC", "4Aces GC"), first_name = c("Charles", 
"Peter"), last_name = c("Howell III", "Uihlein"), first_name_1 = c("Paul", 
"Pat"), last_name_1 = c("Casey", "Perez"), first_name_2 = c("Bryson", 
"Dustin"), last_name_2 = c("DeChambeau", "Johnson"), first_name_3 = c("Anirban", 
"Patrick"), last_name_3 = c("Lahiri", "Reed")), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))

name        first_name last_name  first_name_1 last_name_1 first_name_2 last_name_2 first_name_3 last_name_3
  <chr>       <chr>      <chr>      <chr>        <chr>       <chr>        <chr>       <chr>        <chr>      
1 Crushers GC Charles    Howell III Paul         Casey       Bryson       DeChambeau  Anirban      Lahiri     
2 4Aces GC    Peter      Uihlein    Pat          Perez       Dustin       Johnson     Patrick      Reed       

Our goal is to pivot_longer into a dataframe with 3 columns name, first, last, the code seems like it should be simple however we are not sure what value to pass to names_pattern, we have:

zed %>%
  tidyr::pivot_longer(cols = -name,
                      names_to = c("fname", "lname"),
                      names_pattern = "???")

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here’s the way to go:

df_wide %>%
  tidyr::pivot_longer(cols = -name,
                      names_pattern = "(.*_name).*",
                      names_to = ".value")

#   name        first_name last_name 
# 1 Crushers GC Charles    Howell III
# 2 Crushers GC Paul       Casey     
# 3 Crushers GC Bryson     DeChambeau
# 4 Crushers GC Anirban    Lahiri    
# 5 4Aces GC    Peter      Uihlein   
# 6 4Aces GC    Pat        Perez     
# 7 4Aces GC    Dustin     Johnson   
# 8 4Aces GC    Patrick    Reed      

Explanation:

You should define the pattern of existing columns in names_pattern. In this case, you only want one group that is either last_name or first_name. You should use regex to define those:

(.*_name).*

This will match every column that contains _name. The parenthesis are meant to capture groups. You only need the first part of the column names (so, not the _1, _2) in your expected output, so we use parenthesis only for one group. This is why there is only one names_to element.

In names_to, where you define new column names, .value refers to:

".value" indicates that the corresponding component of the column name
defines the name of the output column containing the cell values,
overriding values_to entirely.

Here, .value refers to whatever matches (.*_name), so both first_name and last_name. And that is why we end up with two columns. If there were middle_name, middle_name_1, … columns in the original data.frame, we would end up with a third middle_name column.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading