Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to pivot wide to long format for columns with different names?

I have a wide df that I would like to convert to long form based on multiple columns. For example, all columns with "Type" would be in a single column and all columns with "Finding" would be in a single column. I believe dplyr is the best approach but have not had luck with pivot_longer.

Initial df

structure(list(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400), 
                                class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
               Site = c("A", "B", "C", "D"), 
               `Finding?` = c("Yes", "Yes", "Yes", "Yes"), 
               `Topic Area` = c("A", "B", "C", "D"), 
               `Type` = c("1", "2", "3", "4"), 
               `Risk Ranking` = c("Medium", "Low", "Medium", "Medium"), 
               `Additional Finding?` = c("Yes", "Yes", "Yes", "Yes"), 
               `Topic Area2` = c("A", "B", "C", "D"), 
               `Type2` = c("1", "2", "2", "3"), 
               `Risk Ranking2` = c("Medium", "Medium", "Low", "Medium")), 
          row.names = c(NA, -4L), 
          class = c("tbl_df", "tbl", "data.frame")) 

Desired output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data.frame(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400, 1648512000, 1648598400, 1648166400, 1648166400), 
                                class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
               Site = c("A", "B", "C", "D", "A", "B", "C", "D"), 
               "Finding?" = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"), 
               "Topic Area" = c("A", "B", "C", "D", "A", "B", "C", "D"), 
               `Type` = c("1", "2", "3", "4", "1", "2", "2", "3"), 
               "Risk Ranking" = c("Medium", "Low", "Medium", "Medium", "Medium", "Medium", "Low", "Medium"))

>Solution :

One of the corresponding set of columns i.e Finding, ‘Additional Finding’ is slightly different than the other sets as those sets of column names have value 2 at the end. Thus, we change only the suffix part of the column ‘Finding’ columns and then use pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
df1 %>% 
  rename_with(~ str_c("Finding", seq_along(.x)), contains("Finding")) %>% 
  pivot_longer(cols = -c(Date, Site), names_to = c(".value"), 
     names_pattern = "(\\D+)\\d*$", values_drop_na = TRUE)

-output

# A tibble: 8 × 6
  Date                Site  Finding `Topic Area` Type  `Risk Ranking`
  <dttm>              <chr> <chr>   <chr>        <chr> <chr>         
1 2022-03-29 00:00:00 A     Yes     A            1     Medium        
2 2022-03-29 00:00:00 A     Yes     A            1     Medium        
3 2022-03-30 00:00:00 B     Yes     B            2     Low           
4 2022-03-30 00:00:00 B     Yes     B            2     Medium        
5 2022-03-25 00:00:00 C     Yes     C            3     Medium        
6 2022-03-25 00:00:00 C     Yes     C            2     Low           
7 2022-03-25 00:00:00 D     Yes     D            4     Medium        
8 2022-03-25 00:00:00 D     Yes     D            3     Medium        
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading