Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Apply function on dataframe by specific group in R

I have a dataframe that looks something like this:

dist   id daytime  season 
3  1.11     Name1     day    summer   
4  2.22     Name2     night  spring   
5  3.33     Name1     day    winter   
6  4.44     Name3     night  fall  

I want of summary of distby some specific collums in my dataframe.

So far I used a custom function:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

summary <- function(x){df %>%                               
    group_by(x) %>% 
    summarize(min = min(dist),
              q1 = quantile(dist, 0.25),
              median = median(dist),
              mean = mean(dist),
              q3 = quantile(dist, 0.75),
              max = max(dist))}

And applied it to any specific collumn I wanted at the moment:

summary_ID <- path.summary(id)

I tried it a few weeks ago and would get something like this>

  id       min    q1 median  mean    q3   max
   <chr>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
 1 Name1   0     17.8   310.   788. 1023. 5832.
 2 Name2   0     31.7   284.   570.  744. 9578.
 3 Name3   0     17.0   325.   721. 1185. 5293.
 4 Name4   0     11.9   197.   530.  865. 3476.
 5 Name5   0     24.5    94.9  617.  966. 9567.

When I try it now I get an error:

Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `x` is not found.

What changed and how do I get around the issue?

>Solution :

Here, we may use {{}} if the input is unquoted

path_summary <- function(dat, x){
  dat %>%                               
    group_by({{x}}) %>% 
    summarize(min = min(dist),
              q1 = quantile(dist, 0.25),
              median = median(dist),
              mean = mean(dist),
              q3 = quantile(dist, 0.75),
              max = max(dist))
}

-testing

> path_summary(df, id)
# A tibble: 3 × 7
  id      min    q1 median  mean    q3   max
  <chr> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 Name1  1.11  1.66   2.22  2.22  2.78  3.33
2 Name2  2.22  2.22   2.22  2.22  2.22  2.22
3 Name3  4.44  4.44   4.44  4.44  4.44  4.44

data

df <- structure(list(dist = c(1.11, 2.22, 3.33, 4.44), id = c("Name1", 
"Name2", "Name1", "Name3"), daytime = c("day", "night", "day", 
"night"), season = c("summer", "spring", "winter", "fall")), 
class = "data.frame", row.names = c("3", 
"4", "5", "6"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading