Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Divide data frame by deciles and compute mean and median for each decile in each year R

Given the following structure of data, I would like to divide the "Category"s in deciles for each year given by "value" and calculate for every year and every decile the mean and median for the variable "value".

df <- data.frame(Category = rep(LETTERS[1:15], each = 6),
                 Year = rep(2010:2015, 15),
                 value = runif(90, 50, 100))

I have tried this but then I have problem calculating the mean and mediam in each year for each decile (not sure this is 100% correct though).

a <- df %>%
  mutate(quantile = ntile(value, 10))

I would like to get a dataframe by year and deciles with the mean and mediam of each decile in each year

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For dividing the df in deciles I have tried this which apparently gives me a result (not sure if 100% correct)

a <- df %>%
  mutate(quantile = ntile(value, 10))

For the calculation of the mean I have tried two methods and both fail:

using data.frame

w <- setDT(a)[, Mean := mean(value), by = .(Year, quantile)][]

and using dplry

z <- a %>% group_by(Year,quantile) %>% summarise (x = mean(value))

What I am looking for as a result is a data frame with one row per year and decile and their mean and mediam values.

That’s how I need the output (with corresponding values in columns Mean and Median and deciles all the way to 10):
enter image description here

>Solution :

How about this:

library(dplyr)
df <- data.frame(Category = rep(LETTERS[1:15], each = 6),
                 Year = rep(2010:2015, 15),
                 value = runif(90, 50, 100))

df %>%
  group_by(Year) %>% 
  mutate(decile = ntile(value, 10)) %>% 
  group_by(Year, decile) %>% 
  summarise(mean = mean(value), 
            median = median(value)) %>% 
  arrange(decile, Year)
#> `summarise()` has grouped output by 'Year'. You can override using the
#> `.groups` argument.
#> # A tibble: 60 × 4
#> # Groups:   Year [6]
#>     Year decile  mean median
#>    <int>  <int> <dbl>  <dbl>
#>  1  2010      1  55.2   55.2
#>  2  2011      1  52.7   52.7
#>  3  2012      1  56.3   56.3
#>  4  2013      1  53.7   53.7
#>  5  2014      1  53.6   53.6
#>  6  2015      1  55.1   55.1
#>  7  2010      2  63.2   63.2
#>  8  2011      2  67.8   67.8
#>  9  2012      2  67.8   67.8
#> 10  2013      2  56.5   56.5
#> # … with 50 more rows

Created on 2023-01-19 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading