Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pivot_longer: pattern based on column names as input

I have a wide data frame with columns of the following pattern: a1_var1, a2_var1, a3_var1, a1_var2, a2_var2, a3_var2, etc. I want to pivot this to longer, creating a new variable named a that contains the values of the number behind the "a" (1, 2, 3) and for all values of ax_var1 to be contained in the column var1 (likewise for var2).

Consider this example data:

df <- data.frame(`id` = seq(1:3),
                 a1_var1 = c(111, 211, 311),
                 a2_var1 = c(121, 221, 321),
                 a3_var1 = c(131, 231, 331),
                 a1_var2 = c(112, 212, 312),
                 a2_var2 = c(122, 222, 322),
                 a3_var2 = c(132, 232, 332)) # first number is ID, second number is "a[x]" value, third number is var[x] 

I would now like to transform the data to long format with the following columns: id (not unique per row anymore), a (containing the number from which the variable was taken, e.g. a1_var2: a = 1), var1 (containing all values from ax_var1 columns, i.e. x11, x21, x31), and var2 (containing all values from ax_var2 columns, i.e. x12, x22, x32).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want to achieve the following structure:

id   a   var1 var2
1    1    111  112
1    2    121  122
1    3    131  132
2    1    211  212
2    2    221  222
2    3    231  232
3    1    311  312
3    2    321  322
3    3    331  332

So far, I’m hard-coding this for every varx like this:

df %>% select(-c(ends_with("var2"))) %>% 
  pivot_longer(cols = c("a1_var1", "a2_var1", "a3_var1"), names_to = "a", values_to = "var1") %>% 
  mutate(a=str_extract(a, "a\\d"),
         a=str_extract(a, "\\d"))

but since I have many varx columns, that’s kind of bothersome – can someone point out a better way of achieving the above outcome?

>Solution :

We can do this in pivot_longer itself – i.e. specify the names_toas c("a", ".value") where ‘a’ will be column name of the prefix substring from the columns to be reshaped and .value represents the column values. In the names_pattern, capture the substring of column name i.e. digits (\\d+) after the ‘a’ and the second capture group after the _

library(dplyr)
library(tidyr)
df %>% 
  pivot_longer(cols = -id, names_to = c("a", ".value"), 
      names_pattern = "a(\\d+)_(.*)")

-output

# A tibble: 9 × 4
     id a      var1  var2
  <int> <chr> <dbl> <dbl>
1     1 1       111   112
2     1 2       121   122
3     1 3       131   132
4     2 1       211   212
5     2 2       221   222
6     2 3       231   232
7     3 1       311   312
8     3 2       321   322
9     3 3       331   332
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading