Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get the highest value in a column depending on three other columns?

The aim is to get the highest educational value between two partners in a household by disregarding the educational level of the children. The first column hhid is the household number id, the second column is the individuals id. The third column relation is the relationship between the individuals in a household: 1 refers to the head of the household, 2 refers to the partner and 3 refers to the children. The fourth column refers to the educational level of these individuals.

The fifth column is the column i would like to get using a code. The aim is to only focus on the highest educational level within a household but only between parents. I usually use pmax to get the highest value between two columns, and group_by to aggregate individuals under a group like a household, but these two commands to not seem to work in this case. Could someone help please?

 hhid id     relation    education    highest_education
    
  1     1        1         3                 3
  1     2        2         2                 3
  1     3        3         5                 3
  2     4        1         4                 4
  2     5        2         2                 4
  3     6        1         1                 2
  3     7        2         2                 2
  4     8        1         1                 3
  4     9        2         3                 3
  4    10        3         4                 3

Here is the data:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

structure(list(hhid = c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4), id = c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10), relation = c(1, 2, 3, 1, 2, 1, 2, 
1, 2, 3), education = c(3, 2, 5, 4, 2, 1, 2, 1, 3, 4)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

>Solution :

You could do:

library(dplyr)

df %>% 
  group_by(hhid) %>% 
  mutate(highest_education = max(education[relation %in% c(1, 2)])) %>% 
  ungroup()
#> # A tibble: 10 × 5
#>     hhid    id relation education highest_education
#>    <dbl> <dbl>    <dbl>     <dbl>             <dbl>
#>  1     1     1        1         3                 3
#>  2     1     2        2         2                 3
#>  3     1     3        3         5                 3
#>  4     2     4        1         4                 4
#>  5     2     5        2         2                 4
#>  6     3     6        1         1                 2
#>  7     3     7        2         2                 2
#>  8     4     8        1         1                 3
#>  9     4     9        2         3                 3
#> 10     4    10        3         4                 3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading