Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Mutate subset average and add to all groups R

I’m trying to create mean values for a certain subset of my data, and I want to add these as a new variable to a different subset of the data.

Say I have the following data:

df <- data.frame(year = rep(c(2010,2012,2017), 6),
                 party = rep(c("A", "A", "A", "B", "B", "B", "C", "C", "C"), 2),
                 left = rep(c(1,1,1,0,0,0,1,1,1),2),
                 area = c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2),
                 pp = c(3,4,5,1,2,6,3,4,2,5,1,1,3,4,5,2,3,0))

Which looks like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

year party left area pp
2010     A    1    1  3
2012     A    1    1  4
2017     A    1    1  5
2010     B    0    1  1
2012     B    0    1  2
2017     B    0    1  6
2010     C    1    1  3
2012     C    1    1  4
2017     C    1    1  2
2010     A    1    2  5
2012     A    1    2  1
2017     A    1    2  1
2010     B    0    2  3
2012     B    0    2  4
2017     B    0    2  5
2010     C    1    2  2
2012     C    1    2  3
2017     C    1    2  0

Now what I want is to create a new variable within each "year" and "area" for left == 1, that is the average of "pp". But I also want to this to the subset where left == 0.

I tried:

df %>%
  group_by(year, left, area) %>%
  mutate(mean_left = mean(pp)) %>% ungroup()

But this gives me a value specific for left == 0 and a specific value for left == 1 (within each year,party).

year party  left  area    pp mean_left
<dbl> <chr> <dbl> <dbl> <dbl>     <dbl>
2010 A         1     1     3       3  
2012 A         1     1     4       4  
2017 A         1     1     5       3.5
2010 B         0     1     1       1  
2012 B         0     1     2       2  
2017 B         0     1     6       6  
2010 C         1     1     3       3  
2012 C         1     1     4       4  
2017 C         1     1     2       3.5
2010 A         1     2     5       3.5
2012 A         1     2     1       2  
2017 A         1     2     1       0.5
2010 B         0     2     3       3  
2012 B         0     2     4       4  
2017 B         0     2     5       5  
2010 C         1     2     2       3.5
2012 C         1     2     3       2  
2017 C         1     2     0       0.5

I also tried:

df %>%
  group_by(year, left, area) %>%
  mutate(mean_left = mean(df$pp[df$left==1])) %>% ungroup()

But here the group_by function seems to fail and I get 2.75 in all columns (which is the value of mean(df$pp[df$left==1]))

How can I get a mean score per year & area for left == 1, adding this same number to this year & area for left == 2?

It should look like:

year party  left  area    pp mean_left
<dbl> <chr> <dbl> <dbl> <dbl>     <dbl>
2010 A         1     1     3       3  
2012 A         1     1     4       4  
2017 A         1     1     5       3.5
2010 B         0     1     1       3  
2012 B         0     1     2       4  
2017 B         0     1     6       3.5  
2010 C         1     1     3       3  
2012 C         1     1     4       4  
2017 C         1     1     2       3.5
2010 A         1     2     5       3.5
2012 A         1     2     1       2  
2017 A         1     2     1       0.5
2010 B         0     2     3       3.5  
2012 B         0     2     4       2  
2017 B         0     2     5       0.5  
2010 C         1     2     2       3.5
2012 C         1     2     3       2  
2017 C         1     2     0       0.5

Thanks in advance!

>Solution :

You were on the right track; you must not group_by left but select it within your mean function.

df %>%
  group_by(year, area) %>%
  mutate(mean_left = mean(pp[left == 1])) %>%
  ungroup()

output

# A tibble: 18 x 6
    year party  left  area    pp mean_left
   <dbl> <chr> <dbl> <dbl> <dbl>     <dbl>
 1  2010 A         1     1     3       3  
 2  2012 A         1     1     4       4  
 3  2017 A         1     1     5       3.5
 4  2010 B         0     1     1       3  
 5  2012 B         0     1     2       4  
 6  2017 B         0     1     6       3.5
 7  2010 C         1     1     3       3  
 8  2012 C         1     1     4       4  
 9  2017 C         1     1     2       3.5
10  2010 A         1     2     5       3.5
11  2012 A         1     2     1       2  
12  2017 A         1     2     1       0.5
13  2010 B         0     2     3       3.5
14  2012 B         0     2     4       2  
15  2017 B         0     2     5       0.5
16  2010 C         1     2     2       3.5
17  2012 C         1     2     3       2  
18  2017 C         1     2     0       0.5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading