Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: pivot multiple columns based on more than one pattern

I have a dataset where there is an ID columns Patient_ID, and multiple columns relating to each baby of a birth event. There are more than one set of each column, as there have been multiple births (twins, triplets etc), and the database decided to work in a wide format.

So, I have the columns:

Patient_ID *for the mother;
pofid_1
pof1completeddate
pof1pregendweeks
pofid_2
pof2completeddate
pof2pregendweeks

etc, etc.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

pofid_1 refers to a unique identifier for each baby, and is the only variable that doesnt follow the format of pofnvarname (pof – pregnancy outcome form). There are ~50 columns for each baby, I have only listed three here for demonstration. Is there a way I can pivot the whole dataset based on the number after pof so I have the following column names, and one row for each baby born:

Patient_ID
babynumber
pofid *baby ID;
pofcompleteddate
pofpregendweeks

So, I am starting off with:

data.frame(
  Patient_ID = c(1, 2, 3, 4),
  pofid_1 = c(1, 2, 3, 4),
  pof1completeddate = as.Date(c("2022-11-12", "2022-12-11", "2022-10-10", "2022-01-01")),
  pof1pregendweeks = c(40, 39, 41, 40),
  pofid_2 = c(NA, NA, 5, 6),
  pof2completeddate = as.Date(c(NA, NA, "2022-10-10", "2022-01-01")),
  pof2pregendweeks = c(NA, NA, 41, 40)
)

Patient_ID pofid_1 pof1completeddate pof1pregendweeks pofid_2 pof2completeddate pof2pregendweeks
1          1       1        2022-11-12               40      NA              <NA>               NA
2          2       2        2022-12-11               39      NA              <NA>               NA
3          3       3        2022-10-10               41       5        2022-10-10               41
4          4       4        2022-01-01               40       6        2022-01-01               40

And want

  Patient_ID pofid babynumber pofcompleteddate pofpregendweeks
1          1     1          1       2022-11-12              40
2          2     2          1       2022-12-11              39
3          3     3          1       2022-10-10              41
4          3     5          2       2022-10-10              41
5          4     4          1       2022-01-01              40
6          4     6          2       2022-01-01              40

>Solution :

It’s best to ensure you have consistent naming across your columns by changing pofid_1 and pof_id2 to pof1id and pof2id. You can do this in one gulp using rename_with. Then, it’s just a case of pivoting to long format and filtering to retain complete cases:

library(tidyverse)

df %>% 
  rename_with(~gsub('pofid_(\\d+)', 'pof\\1id', .x)) %>%
  pivot_longer(-Patient_ID, names_sep = '(?<=pof\\d)',
               names_to = c('babynumber', '.value')) %>%
  filter(complete.cases(.)) %>%
  mutate(babynumber = as.numeric(gsub('\\D', '', babynumber))) %>%
  rename(pofid = id)
#> # A tibble: 6 x 5
#>   Patient_ID babynumber pofid completeddate pregendweeks
#>        <int>      <dbl> <int> <chr>                <int>
#> 1          1          1     1 2022-11-12              40
#> 2          2          1     2 2022-12-11              39
#> 3          3          1     3 2022-10-10              41
#> 4          3          2     5 2022-10-10              41
#> 5          4          1     4 2022-01-01              40
#> 6          4          2     6 2022-01-01               4

Created on 2023-02-13 with reprex v2.0.2


Data in reproducible format

df <- structure(list(Patient_ID = 1:4, pofid_1 = 1:4, 
                     pof1completeddate = c("2022-11-12", 
"2022-12-11", "2022-10-10", "2022-01-01"), pof1pregendweeks = c(40L, 
39L, 41L, 40L), pofid_2 = c(NA, NA, 5L, 6L), pof2completeddate = c(NA, 
NA, "2022-10-10", "2022-01-01"), pof2pregendweeks = c(NA, NA, 
41L, 4L)), class = "data.frame", row.names = c("1", "2", "3", 
"4"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading