Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

From wide to long in R, using the first 2 letters (Y/N) of the column as one id and last 4 (Year) as the second id

I am having trouble with rearranging data from wide to long format using two ids from a single column.

The column names look as follows:

Y_2022 or N_2022.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I would like to split the column name in two, and then use a) Yes/No parameter as the first id, and b) the year as the second id.

Here’s a reproducible code chunk:

            d = structure(list(SCRIPT = c("BC", "BC", "RC", "RC"), 
        INSTITUTE = c("BCR","BCR", "CC", "CC"), CLASS = c("BANK", "CORPORATION","RETAIL", "WHOLESALE"),
    Y_2022 = c(0.86, 3.32, 0.86, 3.35), Y_2023 = c(0.87, 0.86, 0.19, 0.25),
N_2022 = c(-0.86,0.7, 0.06, 0.06), N_2023 = c(0.86, 0.03, 0.86, 0.04)), 
    row.names = c(NA,-4L), class = c("tbl_df", "tbl", "data.frame"))

If it were without a Y/N variant, I would have done the following:

library(tidyr)

#names(d) = gsub(pattern = "Y_*", replacement = "", x = names(d))

d <- pivot_longer(d, cols=4:7, names_to = "Year", values_to = "EUR")

Currently, my dataset looks like this:

 Script| Institute  |  Class   |  Y_2022 |  Y_2023  |  N_2022  |  N_2023 

 BC    | BCR        |  Retail  |  0.86   |  0.86    |  0.86    |  0.86 
 
 RC    |  BCR       |  Retail  |  0.86   |  0.86    |  0.86    |  0.86 
  
 BC    |  CC        |  Retail  |  0.86   |  0.86    |  0.86    |  0.86  
 
 RC    |  CC        |  Retail  |  0.86   |  0.86    |  0.86    |  0.86

And I need it to look like this:

Script  | Institute |  Class |  Year  | Yes-No |  EUR 


BC      |  BCR      | Retail |  2022  |  Y_    |  0.86 


RC      |  BCR      | Retail |  2023  |  Y_    |  0.86 


BC      |  BCR      | Retail |  2022  |  N_    |  0.86 


RC      |  BCR      | Retail |  2023  |  N_    |  0.86

>Solution :

You could use pivot_longer() followed by separate()

pivot_longer(d,-(1:3),values_to = "EUR") %>% 
  separate(name,into=c("Yes-No", "Year"), sep="_")

Output:

   SCRIPT INSTITUTE CLASS       `Yes-No` Year    EUR
   <chr>  <chr>     <chr>       <chr>    <chr> <dbl>
 1 BC     BCR       BANK        Y        2022   0.86
 2 BC     BCR       BANK        Y        2023   0.87
 3 BC     BCR       BANK        N        2022  -0.86
 4 BC     BCR       BANK        N        2023   0.86
 5 BC     BCR       CORPORATION Y        2022   3.32
 6 BC     BCR       CORPORATION Y        2023   0.86
 7 BC     BCR       CORPORATION N        2022   0.7 
 8 BC     BCR       CORPORATION N        2023   0.03
 9 RC     CC        RETAIL      Y        2022   0.86
10 RC     CC        RETAIL      Y        2023   0.19
11 RC     CC        RETAIL      N        2022   0.06
12 RC     CC        RETAIL      N        2023   0.86
13 RC     CC        WHOLESALE   Y        2022   3.35
14 RC     CC        WHOLESALE   Y        2023   0.25
15 RC     CC        WHOLESALE   N        2022   0.06
16 RC     CC        WHOLESALE   N        2023   0.04
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading