Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add column with row values based on a category to R dataframe

so I have a dataframe of the following structure, let’s call this one df0:

year category a b c d
1989 1 0.3 0.7 0.43 321
1989 1 0.3 0.7 0.43 321
1989 2 0.2 0.4 0.5 174
1989 2 0.2 0.4 0.5 174
1989 2 0.2 0.4 0.5 174
1989 3 0.6 0.2 3.0 224
1990 1 0.6 0.2 3.0 93
1990 1 0.6 0.2 3.0 93
1990 2 0.3 0.7 4.0 293
1990 3 0.9 0.6 2.0 13

What I need to turn this into is the following. Basically, I want to add for each year a column with the a c value for each category. Like this:

year category a b c d c1 c2 c3
1989 1 0.3 0.7 0.43 321 0.43 0.5 3.0
1989 1 0.3 0.7 0.43 321 0.43 0.5 3.0
1989 2 0.2 0.4 0.5 174 0.43 0.5 3.0
1989 2 0.2 0.4 0.5 174 0.43 0.5 3.0
1989 2 0.2 0.4 0.5 174 0.43 0.5 3.0
1989 3 0.6 0.2 3.0 224 0.43 0.5 3.0
1990 1 0.6 0.2 3.0 93 3.0 4.0 2.0
1990 1 0.6 0.2 3.0 93 3.0 4.0 2.0
1990 2 0.3 0.7 4.0 293 3.0 4.0 2.0
1990 3 0.9 0.6 2.0 13 3.0 4.0 2.0

I cannot figure out how to compute this. My first Idea would be to create sub dataframes for each year and then create a vector of each c value from this, but this seems very tedious and I cannot get it to work.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Does anyone have input or a solution on this?

KR

>Solution :

One way to do this would be to gather all unique values in each year and reshape the data in wide format using pivot_wider. We can join this reshaped data with original data by year to get back expected data frame as output.

library(dplyr)
library(tidyr)

df %>%
  reframe(unique_c = unique(c), .by = year) %>%
  mutate(row = row_number(), .by = year) %>%
  pivot_wider(names_from = row, names_prefix = "c", values_from = unique_c) %>%
  left_join(df, by = join_by(year)) %>%
  # Optional, only to get data in same order as expected output
  relocate(year, category, a:d)

#   year category     a     b     c     d    c1    c2    c3
#   <int>    <int> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
# 2  1989        1   0.3   0.7  0.43   321  0.43   0.5     3
# 3  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 4  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 5  1989        2   0.2   0.4  0.5    174  0.43   0.5     3
# 6  1989        3   0.6   0.2  3      224  0.43   0.5     3
# 7  1990        1   0.6   0.2  3       93  3      4       2
# 8  1990        1   0.6   0.2  3       93  3      4       2
# 9  1990        2   0.3   0.7  4      293  3      4       2
#10  1990        3   0.9   0.6  2       13  3      4       2

data

df <- structure(list(year = c(1989L, 1989L, 1989L, 1989L, 1989L, 1989L, 
1990L, 1990L, 1990L, 1990L), category = c(1L, 1L, 2L, 2L, 2L, 
3L, 1L, 1L, 2L, 3L), a = c(0.3, 0.3, 0.2, 0.2, 0.2, 0.6, 0.6, 
0.6, 0.3, 0.9), b = c(0.7, 0.7, 0.4, 0.4, 0.4, 0.2, 0.2, 0.2, 
0.7, 0.6), c = c(0.43, 0.43, 0.5, 0.5, 0.5, 3, 3, 3, 4, 2), d = c(321L, 
321L, 174L, 174L, 174L, 224L, 93L, 93L, 293L, 13L)), row.names = c(NA, 
-10L), class = "data.frame")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading