I have a dataframe with duplicate values in what I want to be a unique identifier column. The other columns of the dataframe contain data that I want to preserve by creating new "duplicate" columns. For example, given the following dataframe:
sample_df <- data.frame(
id = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
test = c("A", "B", "C", "A", "B", "A", "B", "C", "D"),
result = c(93, 88, 89, 73, 89, 71, 73, 73, 75)
)
sample_df
# id test result
# 1 1 A 93
# 2 1 B 88
# 3 1 C 89
# 4 2 A 73
# 5 2 B 89
# 6 3 A 71
# 7 3 B 73
# 8 3 C 73
# 9 3 D 75
Is there a way to pivot this by adding new iterations of the non-id columns? I’m thinking of something like this:
# id test1 result1 test2 result2 test3 result3 test4 result4
# 1 1 A 93 B 88 C 89 <NA> NA
# 2 2 A 73 B 89 <NA> NA <NA> NA
# 3 3 A 71 B 73 C 73 D 75
Where the maximum number of duplicates in the id column determines how many new iterations of the non-id columns are joined to the dataframe.
I’ve tried doing this with tidyr::pivot_wider() but with this function I’ve only been able to generate output columns based on the values of the columns (for example, returning A, B, C, and D as the output columns rather than test1, result1, test2, result2, etc.). I may also just be lacking the proper terminology to find other sources for answers.
>Solution :
You could create an "observation number" column within each id and pass it to names_from in pivot_wider():
library(dplyr) # >= 1.1.0
library(tidyr)
sample_df %>%
mutate(obs = row_number(), .by = id) %>%
pivot_wider(
values_from = test:result,
names_from = obs,
names_sep = "",
names_vary = "slowest"
)
# A tibble: 3 × 9
id test1 result1 test2 result2 test3 result3 test4 result4
<dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 1 A 93 B 88 C 89 <NA> NA
2 2 A 73 B 89 <NA> NA <NA> NA
3 3 A 71 B 73 C 73 D 75