Add column with row values based on a category to R dataframe

Advertisements

so I have a dataframe of the following structure, let’s call this one df0:

year category a b c d
1989 1 0.3 0.7 0.43 321
1989 1 0.3 0.7 0.43 321
1989 2 0.2 0.4 0.5 174
1989 2 0.2 0.4 0.5 174
1989 2 0.2 0.4 0.5 174
1989 3 0.6 0.2 3.0 224
1990 1 0.6 0.2 3.0 93
1990 1 0.6 0.2 3.0 93
1990 2 0.3 0.7 4.0 293
1990 3 0.9 0.6 2.0 13

What I need to turn this into is the following. Basically, I want to add for each year a column with the a c value for each category. Like this:

year category a b c d c1 c2 c3
1989 1 0.3 0.7 0.43 321 0.43 0.5 3.0
1989 1 0.3 0.7 0.43 321 0.43 0.5 3.0
1989 2 0.2 0.4 0.5 174 0.43 0.5 3.0
1989 2 0.2 0.4 0.5 174 0.43 0.5 3.0
1989 2 0.2 0.4 0.5 174 0.43 0.5 3.0
1989 3 0.6 0.2 3.0 224 0.43 0.5 3.0
1990 1 0.6 0.2 3.0 93 3.0 4.0 2.0
1990 1 0.6 0.2 3.0 93 3.0 4.0 2.0
1990 2 0.3 0.7 4.0 293 3.0 4.0 2.0
1990 3 0.9 0.6 2.0 13 3.0 4.0 2.0

I cannot figure out how to compute this. My first Idea would be to create sub dataframes for each year and then create a vector of each c value from this, but this seems very tedious and I cannot get it to work.

Does anyone have input or a solution on this?

KR

>Solution :

One way to do this would be to gather all unique values in each year and reshape the data in wide format using pivot_wider. We can join this reshaped data with original data by year to get back expected data frame as output.

library(dplyr)
library(tidyr)

df %>%
  reframe(unique_c = unique(c), .by = year) %>%
  mutate(row = row_number(), .by = year) %>%
  pivot_wider(names_from = row, names_prefix = "c", values_from = unique_c) %>%
  left_join(df, by = join_by(year)) %>%
  # Optional, only to get data in same order as expected output
  relocate(year, category, a:d)

#   year category     a     b     c     d    c1    c2    c3
#   <int>    <int> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
# 2  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
# 3  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 4  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 5  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 6  1989        3   0.6   0.2  3      224  0.43   0.5     3
# 7  1990        1   0.6   0.2  3       93  3      4       2
# 8  1990        1   0.6   0.2  3       93  3      4       2
# 9  1990        2   0.3   0.7  4      293  3      4       2
#10  1990        3   0.9   0.6  2       13  3      4       2

data

df <- structure(list(year = c(1989L, 1989L, 1989L, 1989L, 1989L, 1989L, 
1990L, 1990L, 1990L, 1990L), category = c(1L, 1L, 2L, 2L, 2L, 
3L, 1L, 1L, 2L, 3L), a = c(0.3, 0.3, 0.2, 0.2, 0.2, 0.6, 0.6, 
0.6, 0.3, 0.9), b = c(0.7, 0.7, 0.4, 0.4, 0.4, 0.2, 0.2, 0.2, 
0.7, 0.6), c = c(0.43, 0.43, 0.5, 0.5, 0.5, 3, 3, 3, 4, 2), d = c(321L, 
321L, 174L, 174L, 174L, 224L, 93L, 93L, 293L, 13L)), row.names = c(NA, 
-10L), class = "data.frame")

Leave a ReplyCancel reply