R: pivot multiple columns based on more than one pattern

I have a dataset where there is an ID columns Patient_ID, and multiple columns relating to each baby of a birth event. There are more than one set of each column, as there have been multiple births (twins, triplets etc), and the database decided to work in a wide format.

So, I have the columns:

Patient_ID *for the mother;
pofid_1
pof1completeddate
pof1pregendweeks
pofid_2
pof2completeddate
pof2pregendweeks

etc, etc.

pofid_1 refers to a unique identifier for each baby, and is the only variable that doesnt follow the format of pofnvarname (pof – pregnancy outcome form). There are ~50 columns for each baby, I have only listed three here for demonstration. Is there a way I can pivot the whole dataset based on the number after pof so I have the following column names, and one row for each baby born:

Patient_ID
babynumber
pofid *baby ID;
pofcompleteddate
pofpregendweeks

So, I am starting off with:

data.frame(
  Patient_ID = c(1, 2, 3, 4),
  pofid_1 = c(1, 2, 3, 4),
  pof1completeddate = as.Date(c("2022-11-12", "2022-12-11", "2022-10-10", "2022-01-01")),
  pof1pregendweeks = c(40, 39, 41, 40),
  pofid_2 = c(NA, NA, 5, 6),
  pof2completeddate = as.Date(c(NA, NA, "2022-10-10", "2022-01-01")),
  pof2pregendweeks = c(NA, NA, 41, 40)
)

Patient_ID pofid_1 pof1completeddate pof1pregendweeks pofid_2 pof2completeddate pof2pregendweeks
1          1       1        2022-11-12               40      NA              <NA>               NA
2          2       2        2022-12-11               39      NA              <NA>               NA
3          3       3        2022-10-10               41       5        2022-10-10               41
4          4       4        2022-01-01               40       6        2022-01-01               40

And want

  Patient_ID pofid babynumber pofcompleteddate pofpregendweeks
1          1     1          1       2022-11-12              40
2          2     2          1       2022-12-11              39
3          3     3          1       2022-10-10              41
4          3     5          2       2022-10-10              41
5          4     4          1       2022-01-01              40
6          4     6          2       2022-01-01              40

>Solution :

It’s best to ensure you have consistent naming across your columns by changing pofid_1 and pof_id2 to pof1id and pof2id. You can do this in one gulp using rename_with. Then, it’s just a case of pivoting to long format and filtering to retain complete cases:

library(tidyverse)

df %>% 
  rename_with(~gsub('pofid_(\\d+)', 'pof\\1id', .x)) %>%
  pivot_longer(-Patient_ID, names_sep = '(?<=pof\\d)',
               names_to = c('babynumber', '.value')) %>%
  filter(complete.cases(.)) %>%
  mutate(babynumber = as.numeric(gsub('\\D', '', babynumber))) %>%
  rename(pofid = id)
#> # A tibble: 6 x 5
#>   Patient_ID babynumber pofid completeddate pregendweeks
#>        <int>      <dbl> <int> <chr>                <int>
#> 1          1          1     1 2022-11-12              40
#> 2          2          1     2 2022-12-11              39
#> 3          3          1     3 2022-10-10              41
#> 4          3          2     5 2022-10-10              41
#> 5          4          1     4 2022-01-01              40
#> 6          4          2     6 2022-01-01               4

Created on 2023-02-13 with reprex v2.0.2


Data in reproducible format

df <- structure(list(Patient_ID = 1:4, pofid_1 = 1:4, 
                     pof1completeddate = c("2022-11-12", 
"2022-12-11", "2022-10-10", "2022-01-01"), pof1pregendweeks = c(40L, 
39L, 41L, 40L), pofid_2 = c(NA, NA, 5L, 6L), pof2completeddate = c(NA, 
NA, "2022-10-10", "2022-01-01"), pof2pregendweeks = c(NA, NA, 
41L, 4L)), class = "data.frame", row.names = c("1", "2", "3", 
"4"))

Leave a Reply