so I have a dataframe of the following structure, let’s call this one df0:
year | category | a | b | c | d |
---|---|---|---|---|---|
1989 | 1 | 0.3 | 0.7 | 0.43 | 321 |
1989 | 1 | 0.3 | 0.7 | 0.43 | 321 |
1989 | 2 | 0.2 | 0.4 | 0.5 | 174 |
1989 | 2 | 0.2 | 0.4 | 0.5 | 174 |
1989 | 2 | 0.2 | 0.4 | 0.5 | 174 |
1989 | 3 | 0.6 | 0.2 | 3.0 | 224 |
1990 | 1 | 0.6 | 0.2 | 3.0 | 93 |
1990 | 1 | 0.6 | 0.2 | 3.0 | 93 |
1990 | 2 | 0.3 | 0.7 | 4.0 | 293 |
1990 | 3 | 0.9 | 0.6 | 2.0 | 13 |
What I need to turn this into is the following. Basically, I want to add for each year a column with the a c value for each category. Like this:
year | category | a | b | c | d | c1 | c2 | c3 |
---|---|---|---|---|---|---|---|---|
1989 | 1 | 0.3 | 0.7 | 0.43 | 321 | 0.43 | 0.5 | 3.0 |
1989 | 1 | 0.3 | 0.7 | 0.43 | 321 | 0.43 | 0.5 | 3.0 |
1989 | 2 | 0.2 | 0.4 | 0.5 | 174 | 0.43 | 0.5 | 3.0 |
1989 | 2 | 0.2 | 0.4 | 0.5 | 174 | 0.43 | 0.5 | 3.0 |
1989 | 2 | 0.2 | 0.4 | 0.5 | 174 | 0.43 | 0.5 | 3.0 |
1989 | 3 | 0.6 | 0.2 | 3.0 | 224 | 0.43 | 0.5 | 3.0 |
1990 | 1 | 0.6 | 0.2 | 3.0 | 93 | 3.0 | 4.0 | 2.0 |
1990 | 1 | 0.6 | 0.2 | 3.0 | 93 | 3.0 | 4.0 | 2.0 |
1990 | 2 | 0.3 | 0.7 | 4.0 | 293 | 3.0 | 4.0 | 2.0 |
1990 | 3 | 0.9 | 0.6 | 2.0 | 13 | 3.0 | 4.0 | 2.0 |
I cannot figure out how to compute this. My first Idea would be to create sub dataframes for each year and then create a vector of each c value from this, but this seems very tedious and I cannot get it to work.
Does anyone have input or a solution on this?
KR
>Solution :
One way to do this would be to gather all unique
values in each year
and reshape the data in wide format using pivot_wider
. We can join this reshaped data with original data by year
to get back expected data frame as output.
library(dplyr)
library(tidyr)
df %>%
reframe(unique_c = unique(c), .by = year) %>%
mutate(row = row_number(), .by = year) %>%
pivot_wider(names_from = row, names_prefix = "c", values_from = unique_c) %>%
left_join(df, by = join_by(year)) %>%
# Optional, only to get data in same order as expected output
relocate(year, category, a:d)
# year category a b c d c1 c2 c3
# <int> <int> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1 1989 1 0.3 0.7 0.43 321 0.43 0.5 3
# 2 1989 1 0.3 0.7 0.43 321 0.43 0.5 3
# 3 1989 2 0.2 0.4 0.5 174 0.43 0.5 3
# 4 1989 2 0.2 0.4 0.5 174 0.43 0.5 3
# 5 1989 2 0.2 0.4 0.5 174 0.43 0.5 3
# 6 1989 3 0.6 0.2 3 224 0.43 0.5 3
# 7 1990 1 0.6 0.2 3 93 3 4 2
# 8 1990 1 0.6 0.2 3 93 3 4 2
# 9 1990 2 0.3 0.7 4 293 3 4 2
#10 1990 3 0.9 0.6 2 13 3 4 2
data
df <- structure(list(year = c(1989L, 1989L, 1989L, 1989L, 1989L, 1989L,
1990L, 1990L, 1990L, 1990L), category = c(1L, 1L, 2L, 2L, 2L,
3L, 1L, 1L, 2L, 3L), a = c(0.3, 0.3, 0.2, 0.2, 0.2, 0.6, 0.6,
0.6, 0.3, 0.9), b = c(0.7, 0.7, 0.4, 0.4, 0.4, 0.2, 0.2, 0.2,
0.7, 0.6), c = c(0.43, 0.43, 0.5, 0.5, 0.5, 3, 3, 3, 4, 2), d = c(321L,
321L, 174L, 174L, 174L, 224L, 93L, 93L, 293L, 13L)), row.names = c(NA,
-10L), class = "data.frame")