Home Pivot_wider: Combine Duplicate Observations AND Create New Variable Columns for Those Values

Questions

Pivot_wider: Combine Duplicate Observations AND Create New Variable Columns for Those Values

May 26, 2022

I’m new to R and have scoured the site to find a solution – I’ve found lots of similar, but slightly different questions. I’m stumped.

I have a dataset in this structure:

  SURVEY_ID    CHILD_NAME    CHILD_AGE
  Survey1      Billy             4
  Survey2      Claude            12
  Survey2      Maude             6
  Survey2      Constance         3
  Survey3      George            22
  Survey4      Marjoram          14
  Survey4      LeBron            37

I’m trying to pivot the data wider so that there’s a) only one unique SURVEY_ID per row, and, critically, b) a new column for second, third, etc. children for surveys with more than one child.

So the result would look like:

    SURVEY_ID    CHILD_NAME1    CHILD_NAME2    CHILD_NAME3    CHILD_AGE1  CHILD_AGE2  CHILD_AGE3
    Survey1      Billy                                        4
    Survey2      Claude         Maude          Constance      12          6           3
    Survey3      George                                       22
    Survey4      Marjoram       Lebron                        14          37

The actual data has thousands of surveys and the number of "child names" and "child ages" could be as high as 10. It’s the issue of creating the new columns not from existing value names and only where there are multiple children that has me perplexed.

>Solution :

Using base R:

reshape(transform(df, time = ave(SURVEY_ID, SURVEY_ID, FUN=seq)), 
       v.names = c('CHILD_NAME', 'CHILD_AGE'), 
       direction = 'wide', idvar = 'SURVEY_ID', sep = '_')

  SURVEY_ID CHILD_NAME_1 CHILD_AGE_1 CHILD_NAME_2 CHILD_AGE_2 CHILD_NAME_3 CHILD_AGE_3
1   Survey1        Billy           4         <NA>          NA         <NA>          NA
2   Survey2       Claude          12        Maude           6    Constance           3
5   Survey3       George          22         <NA>          NA         <NA>          NA
6   Survey4     Marjoram          14       LeBron          37         <NA>          NA

using tidyverse:

library(tidyverse)
df %>%
  group_by(SURVEY_ID) %>%
  mutate(name = row_number()) %>%
  pivot_wider(SURVEY_ID, values_from = c(CHILD_NAME, CHILD_AGE))

# A tibble: 4 x 7
# Groups:   SURVEY_ID [4]
  SURVEY_ID CHILD_NAME_1 CHILD_NAME_2 CHILD_NAME_3 CHILD_AGE_1 CHILD_AGE_2 CHILD_AGE_3
  <chr>     <chr>        <chr>        <chr>              <int>       <int>       <int>
1 Survey1   Billy        NA           NA                     4          NA          NA
2 Survey2   Claude       Maude        Constance             12           6           3
3 Survey3   George       NA           NA                    22          NA          NA
4 Survey4   Marjoram     LeBron       NA                    14          37          NA

using data.table

library(data.table)
dcast(setDT(df), SURVEY_ID~rowid(SURVEY_ID), value.var = c('CHILD_AGE', 'CHILD_NAME'))
   SURVEY_ID CHILD_AGE_1 CHILD_AGE_2 CHILD_AGE_3 CHILD_NAME_1 CHILD_NAME_2 CHILD_NAME_3
1:   Survey1           4          NA          NA        Billy         <NA>         <NA>
2:   Survey2          12           6           3       Claude        Maude    Constance
3:   Survey3          22          NA          NA       George         <NA>         <NA>
4:   Survey4          14          37          NA     Marjoram       LeBron         <NA>