Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Subsetting or filtering within dplyr::summarise

There are multiple similar questions on this but not the same problem

MWE:

library(dplyr)
library(lubridate)

df= data.frame(id = c(1:5),
               type = c("a", "b", "b", "a", "b"),
               start = dmy(c("05/05/2005","06/06/2006", "07/07/2007", "08/08/2008", "09/09/2009")),
               finish = dmy(c("08/08/2008", "09/09/2009","02/02/2011","02/02/2011", NA)),
               not_used = c(F,T,F,T,F))

I want to produce a summary, grouped by type, including the total number of not_used by type and the mean difference between start and finish in months when not_used is False. This is how I’m trying:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df%>%group_by(type)%>%
  summarise(Not_used =  sum(not_used),
            `Mean_Lifespan_of_used(months)` = mean((interval(start,finish)/months(1), na.rm= T)[not_used == F]))

With this I’m getting unexpected token ',', resulting in error:

Error: unexpected ',' in:
"  summarise(Not_used =  sum(not_used),
            `Mean_Lifespan_of_used(months)` = mean((interval(start,finish)/months(1),"

I appreciate I could create a new column before the summarise function, but I’d like to understand what I’m doing wrong here.

>Solution :

The input data as.Date needs format = "%d/%m/%Y" and then subset the output before doing the mean (as mean returns a single value whereas not_used length is different)

library(dplyr)
library(lubridate)
df%>%
   group_by(type)%>%
  summarise(Not_used =  sum(not_used),  
   `Mean_Lifespan_of_used(months)` = mean((interval(start, 
         finish)/months(1))[not_used == FALSE], na.rm = TRUE))

-output

# A tibble: 2 × 3
  type  Not_used `Mean_Lifespan_of_used(months)`
  <chr>    <int>                           <dbl>
1 a            1                            39.1
2 b            1                            42.8
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading