Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

average values in a column based on two factors

I have a large dataframe with 4 different variables ($name), that have many different values ($expl.val). Based on a model I made, I also have thirty separate predictions of ten different models, 3 runs for each ($pred.name). For each prediction I have a value (pred.value).

name    expl.val    pred.name   pred.val
var1    0           RUN1_MODEL1 0.57727
var1    0           RUN2_MODEL1 0.561696
var1    0           RUN3_MODEL1 0.553354
var1    0           RUN1_MODEL2 0.719538
var1    0           RUN2_MODEL2 0.695912
var1    0           RUN3_MODEL2 0.729262
var2    5           RUN1_MODEL1 0.694463
var2    5           RUN2_MODEL1 0.699222
var2    5           RUN3_MODEL1 0.695147
var2    5           RUN1_MODEL4 0.886816
var2    5           RUN2_MODEL4 0.960639
var2    5           RUN3_MODEL4 0.982607

My ultimate goal is to plot response curves (x-axis: $expl.value, y-axis: $pred.value), but as the data frame is now, I would for each variable (%name) have 30 response curves, as I have 30 separate predictions per variable value. This would be messy.

Therefore, I’d like to eliminate the separate prediction runs per model, so that for each variable value I only have 10 predicted values (ten per model instead of thirty).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So the output would, in this case, look like this:

name    expl.val    pred.name   pred.val
var1    0           RUNavg_MODEL1   0.564107
var1    0           RUNavg_MODEL2   0.714904
var2    5           RUNavg_MODEL1   0.696227
var2    5           RUNavg_MODEL4   0.943354

I am unsure how to approach this in Rstudio, because I want to average based on the different runs per different model, but then also separated by the value of the 4 variables on which the predictions depend.
My only instinct is that I may want to split $pred.name so I get a new factor variable with 3 levels (RUN1,RUN2,RUN3) that I then somehow need to average not only by MODEL# but also by $expl.val.

>Solution :

You could use

library(stringr)
library(dplyr)

df %>% 
  group_by(name, expl.val, model = str_replace(pred.name, "RUN\\d+_", "RUNavg_")) %>% 
  summarise(pred.val = mean(pred.val), .groups = "drop")

This returns

# A tibble: 4 x 4
  name  expl.val model              pred.val
  <chr>    <dbl> <chr>                 <dbl>
1 var1         0 AllData_RUNavg_ANN    0.564
2 var1         0 AllData_RUNavg_GBM    0.715
3 var2         5 AllData_RUNavg_ANN    0.696
4 var2         5 AllData_RUNavg_GLM    0.943

Data

structure(list(name = c("var1", "var1", "var1", "var1", "var1", 
"var1", "var2", "var2", "var2", "var2", "var2", "var2"), expl.val = c(0, 
0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5), pred.name = c("AllData_RUN1_ANN", 
"AllData_RUN2_ANN", "AllData_RUN3_ANN", "AllData_RUN1_GBM", "AllData_RUN2_GBM", 
"AllData_RUN3_GBM", "AllData_RUN1_ANN", "AllData_RUN2_ANN", "AllData_RUN3_ANN", 
"AllData_RUN1_GLM", "AllData_RUN2_GLM", "AllData_RUN3_GLM"), 
    pred.val = c(0.57727, 0.561696, 0.553354, 0.719538, 0.695912, 
    0.729262, 0.694463, 0.699222, 0.695147, 0.886816, 0.960639, 
    0.982607)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -12L), spec = structure(list(cols = list(
    name = structure(list(), class = c("collector_character", 
    "collector")), expl.val = structure(list(), class = c("collector_double", 
    "collector")), pred.name = structure(list(), class = c("collector_character", 
    "collector")), pred.val = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading