Home From wide to long in R, using the first 2 letters (Y/N) of the column as one id and last 4 (Year) as the second id

Questions

From wide to long in R, using the first 2 letters (Y/N) of the column as one id and last 4 (Year) as the second id

August 3, 2022

I am having trouble with rearranging data from wide to long format using two ids from a single column.

The column names look as follows:

Y_2022 or N_2022.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel

I would like to split the column name in two, and then use a) Yes/No parameter as the first id, and b) the year as the second id.

Here’s a reproducible code chunk:

            d = structure(list(SCRIPT = c("BC", "BC", "RC", "RC"), 
        INSTITUTE = c("BCR","BCR", "CC", "CC"), CLASS = c("BANK", "CORPORATION","RETAIL", "WHOLESALE"),
    Y_2022 = c(0.86, 3.32, 0.86, 3.35), Y_2023 = c(0.87, 0.86, 0.19, 0.25),
N_2022 = c(-0.86,0.7, 0.06, 0.06), N_2023 = c(0.86, 0.03, 0.86, 0.04)), 
    row.names = c(NA,-4L), class = c("tbl_df", "tbl", "data.frame"))

If it were without a Y/N variant, I would have done the following:

library(tidyr)

#names(d) = gsub(pattern = "Y_*", replacement = "", x = names(d))

d <- pivot_longer(d, cols=4:7, names_to = "Year", values_to = "EUR")

Currently, my dataset looks like this:

 Script| Institute  |  Class   |  Y_2022 |  Y_2023  |  N_2022  |  N_2023 

 BC    | BCR        |  Retail  |  0.86   |  0.86    |  0.86    |  0.86 
 
 RC    |  BCR       |  Retail  |  0.86   |  0.86    |  0.86    |  0.86 
  
 BC    |  CC        |  Retail  |  0.86   |  0.86    |  0.86    |  0.86  
 
 RC    |  CC        |  Retail  |  0.86   |  0.86    |  0.86    |  0.86

And I need it to look like this:

Script  | Institute |  Class |  Year  | Yes-No |  EUR 


BC      |  BCR      | Retail |  2022  |  Y_    |  0.86 


RC      |  BCR      | Retail |  2023  |  Y_    |  0.86 


BC      |  BCR      | Retail |  2022  |  N_    |  0.86 


RC      |  BCR      | Retail |  2023  |  N_    |  0.86

>Solution :

You could use pivot_longer() followed by separate()

pivot_longer(d,-(1:3),values_to = "EUR") %>% 
  separate(name,into=c("Yes-No", "Year"), sep="_")

Output:

   SCRIPT INSTITUTE CLASS       `Yes-No` Year    EUR
   <chr>  <chr>     <chr>       <chr>    <chr> <dbl>
 1 BC     BCR       BANK        Y        2022   0.86
 2 BC     BCR       BANK        Y        2023   0.87
 3 BC     BCR       BANK        N        2022  -0.86
 4 BC     BCR       BANK        N        2023   0.86
 5 BC     BCR       CORPORATION Y        2022   3.32
 6 BC     BCR       CORPORATION Y        2023   0.86
 7 BC     BCR       CORPORATION N        2022   0.7 
 8 BC     BCR       CORPORATION N        2023   0.03
 9 RC     CC        RETAIL      Y        2022   0.86
10 RC     CC        RETAIL      Y        2023   0.19
11 RC     CC        RETAIL      N        2022   0.06
12 RC     CC        RETAIL      N        2023   0.86
13 RC     CC        WHOLESALE   Y        2022   3.35
14 RC     CC        WHOLESALE   Y        2023   0.25
15 RC     CC        WHOLESALE   N        2022   0.06
16 RC     CC        WHOLESALE   N        2023   0.04