I’m new to R and have scoured the site to find a solution – I’ve found lots of similar, but slightly different questions. I’m stumped.
I have a dataset in this structure:
SURVEY_ID CHILD_NAME CHILD_AGE
Survey1 Billy 4
Survey2 Claude 12
Survey2 Maude 6
Survey2 Constance 3
Survey3 George 22
Survey4 Marjoram 14
Survey4 LeBron 37
I’m trying to pivot the data wider so that there’s a) only one unique SURVEY_ID per row, and, critically, b) a new column for second, third, etc. children for surveys with more than one child.
So the result would look like:
SURVEY_ID CHILD_NAME1 CHILD_NAME2 CHILD_NAME3 CHILD_AGE1 CHILD_AGE2 CHILD_AGE3
Survey1 Billy 4
Survey2 Claude Maude Constance 12 6 3
Survey3 George 22
Survey4 Marjoram Lebron 14 37
The actual data has thousands of surveys and the number of "child names" and "child ages" could be as high as 10. It’s the issue of creating the new columns not from existing value names and only where there are multiple children that has me perplexed.
>Solution :
Using base R:
reshape(transform(df, time = ave(SURVEY_ID, SURVEY_ID, FUN=seq)),
v.names = c('CHILD_NAME', 'CHILD_AGE'),
direction = 'wide', idvar = 'SURVEY_ID', sep = '_')
SURVEY_ID CHILD_NAME_1 CHILD_AGE_1 CHILD_NAME_2 CHILD_AGE_2 CHILD_NAME_3 CHILD_AGE_3
1 Survey1 Billy 4 <NA> NA <NA> NA
2 Survey2 Claude 12 Maude 6 Constance 3
5 Survey3 George 22 <NA> NA <NA> NA
6 Survey4 Marjoram 14 LeBron 37 <NA> NA
using tidyverse:
library(tidyverse)
df %>%
group_by(SURVEY_ID) %>%
mutate(name = row_number()) %>%
pivot_wider(SURVEY_ID, values_from = c(CHILD_NAME, CHILD_AGE))
# A tibble: 4 x 7
# Groups: SURVEY_ID [4]
SURVEY_ID CHILD_NAME_1 CHILD_NAME_2 CHILD_NAME_3 CHILD_AGE_1 CHILD_AGE_2 CHILD_AGE_3
<chr> <chr> <chr> <chr> <int> <int> <int>
1 Survey1 Billy NA NA 4 NA NA
2 Survey2 Claude Maude Constance 12 6 3
3 Survey3 George NA NA 22 NA NA
4 Survey4 Marjoram LeBron NA 14 37 NA
using data.table
library(data.table)
dcast(setDT(df), SURVEY_ID~rowid(SURVEY_ID), value.var = c('CHILD_AGE', 'CHILD_NAME'))
SURVEY_ID CHILD_AGE_1 CHILD_AGE_2 CHILD_AGE_3 CHILD_NAME_1 CHILD_NAME_2 CHILD_NAME_3
1: Survey1 4 NA NA Billy <NA> <NA>
2: Survey2 12 6 3 Claude Maude Constance
3: Survey3 22 NA NA George <NA> <NA>
4: Survey4 14 37 NA Marjoram LeBron <NA>