Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R- Collapse rows by getting average values

I have data that has multiple observations per week (SARS-CoV-2 detected in wastewater), but only one outcome per week (hospitalizations). Currently each row is by date of collection, and it also has "week" as a column. I would like to collapse the rows in the same week, but have the wastewater observation for the week be the average of the individual collections. Most weeks have 2 observations, but some have 1 and some have three.

Data currently looks like this:

week <- c(1,1,1,2,2,3)
col_a <- c(1,2,2,4,2,7)
col_b <- c(4,2,3,1,2,5)
col_c <- c(4,2,3,2,2,4)
df <- data.frame(week,col_a,col_b,col_c) 

   week date site_a site_b hosp
1:    1    1      1      4    0
2:    1    2      2      2    0
3:    1    3      2      3    3
4:    2    4      4      1    0
5:    2    5      2      2    2
6:    3    6      7      5    4

And I’d like it to look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Week   SiteA   SiteB   Hosp
1      1.667   3       3
2      3       1.5     2 
3      7       5       4

When I try group_by, I get almost all the way there, but hospitalizations is now an average as well:

> df1%>%group_by(week)%>%summarise_all(funs(mean))
# A tibble: 3 × 5
   week  date site_a site_b  hosp
  <dbl> <dbl>  <dbl>  <dbl> <dbl>
1     1   2     1.67    3       1
2     2   4.5   3       1.5     1
3     3   6     7       5       4

How do I group by week, and have the sites average but the hospitalizations sum?

Would love some help, thanks!

>Solution :

This should do the trick:

#### Load Library ####
library(tidyverse)

#### Group & Summarise ####
df %>% 
  group_by(week) %>% 
  summarise(SiteA = mean(col_a),
            SiteB = mean(col_b),
            Hospital = mean(col_c))

Which gives you this:

# A tibble: 3 × 4
   week SiteA SiteB Hospital
  <dbl> <dbl> <dbl>    <dbl>
1     1  1.67   3          3
2     2  3      1.5        2
3     3  7      5          4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading