Add column with row values based on a category to R dataframe

January 24, 2024

so I have a dataframe of the following structure, let’s call this one df0:

year	category	a	b	c	d
1989	1	0.3	0.7	0.43	321
1989	1	0.3	0.7	0.43	321
1989	2	0.2	0.4	0.5	174
1989	2	0.2	0.4	0.5	174
1989	2	0.2	0.4	0.5	174
1989	3	0.6	0.2	3.0	224
1990	1	0.6	0.2	3.0	93
1990	1	0.6	0.2	3.0	93
1990	2	0.3	0.7	4.0	293
1990	3	0.9	0.6	2.0	13

What I need to turn this into is the following. Basically, I want to add for each year a column with the a c value for each category. Like this:

year	category	a	b	c	d	c1	c2	c3
1989	1	0.3	0.7	0.43	321	0.43	0.5	3.0
1989	1	0.3	0.7	0.43	321	0.43	0.5	3.0
1989	2	0.2	0.4	0.5	174	0.43	0.5	3.0
1989	2	0.2	0.4	0.5	174	0.43	0.5	3.0
1989	2	0.2	0.4	0.5	174	0.43	0.5	3.0
1989	3	0.6	0.2	3.0	224	0.43	0.5	3.0
1990	1	0.6	0.2	3.0	93	3.0	4.0	2.0
1990	1	0.6	0.2	3.0	93	3.0	4.0	2.0
1990	2	0.3	0.7	4.0	293	3.0	4.0	2.0
1990	3	0.9	0.6	2.0	13	3.0	4.0	2.0

I cannot figure out how to compute this. My first Idea would be to create sub dataframes for each year and then create a vector of each c value from this, but this seems very tedious and I cannot get it to work.

Does anyone have input or a solution on this?

>Solution :

One way to do this would be to gather all unique values in each year and reshape the data in wide format using pivot_wider. We can join this reshaped data with original data by year to get back expected data frame as output.

library(dplyr)
library(tidyr)

df %>%
  reframe(unique_c = unique(c), .by = year) %>%
  mutate(row = row_number(), .by = year) %>%
  pivot_wider(names_from = row, names_prefix = "c", values_from = unique_c) %>%
  left_join(df, by = join_by(year)) %>%
  # Optional, only to get data in same order as expected output
  relocate(year, category, a:d)

#   year category     a     b     c     d    c1    c2    c3
#   <int>    <int> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
# 2  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
# 3  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 4  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 5  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 6  1989        3   0.6   0.2  3      224  0.43   0.5     3
# 7  1990        1   0.6   0.2  3       93  3      4       2
# 8  1990        1   0.6   0.2  3       93  3      4       2
# 9  1990        2   0.3   0.7  4      293  3      4       2
#10  1990        3   0.9   0.6  2       13  3      4       2

data

df <- structure(list(year = c(1989L, 1989L, 1989L, 1989L, 1989L, 1989L, 
1990L, 1990L, 1990L, 1990L), category = c(1L, 1L, 2L, 2L, 2L, 
3L, 1L, 1L, 2L, 3L), a = c(0.3, 0.3, 0.2, 0.2, 0.2, 0.6, 0.6, 
0.6, 0.3, 0.9), b = c(0.7, 0.7, 0.4, 0.4, 0.4, 0.2, 0.2, 0.2, 
0.7, 0.6), c = c(0.43, 0.43, 0.5, 0.5, 0.5, 3, 3, 3, 4, 2), d = c(321L, 
321L, 174L, 174L, 174L, 224L, 93L, 93L, 293L, 13L)), row.names = c(NA, 
-10L), class = "data.frame")