Home How to perform majority voting from a data frame with ranking criteria

Questions

How to perform majority voting from a data frame with ranking criteria

January 23, 2023

I have the following data frame:

dat <- structure(list(model_name = c("Random Forest", "XGBoost", "XGBoost-reg", 
"Null model", "Plain LM", "Elastic LM", "LM-pep.charge", "LM-rf.10vip"
), RMSE = c(0.853, 0.886, 0.719, 2.41, 16.6, 0.731, 1.16, 1.03
), MAE = c(0.545, 0.708, 0.589, 1.98, 8.6, 0.588, 0.874, 0.729
), `R^2` = c(0.806, 0.865, 0.915, NA, 0.0645, 0.927, 0.8, 0.822
), ccc = c(0.89, 0.928, 0.951, 0, 0.0685, 0.945, 0.847, 0.901
)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))

It looks like this:

  model_name      RMSE   MAE   `R^2`    ccc
  <chr>          <dbl> <dbl>   <dbl>  <dbl>
1 Random Forest  0.853 0.545  0.806  0.89  
2 XGBoost        0.886 0.708  0.865  0.928 
3 XGBoost-reg    0.719 0.589  0.915  0.951 
4 Null model     2.41  1.98  NA      0     
5 Plain LM      16.6   8.6    0.0645 0.0685
6 Elastic LM     0.731 0.588  0.927  0.945 
7 LM-pep.charge  1.16  0.874  0.8    0.847 
8 LM-rf.10vip    1.03  0.729  0.822  0.901

It stores the evaluation metrics for 8 prediction models.
My goal is to select the top-performing model that consistently excels in the majority of evaluations.

By manually evaluating the metrics, I determined the top performing model this way:

Metrics -> Top 1
-----------------
RMSE -> XGBoost-reg 
MAE -> RF
R^2 -> Elastic LM 
CCC -> XGBoost-reg 

# Therefore, the winner is XGBoost-reg

It’s worth noting that RMSE and MAE are error measures, with lower values indicating better performance, while R^2 and CCC are correlation measures, with higher values indicating better performance.

How can I do this with R?

>Solution :

We may either convert the data into ‘long’ format, do a group by ‘name’ and get the row with lowest value of ‘value1’ (after modifying the case for R^2 and ccc – multiplying by -1), then get the frequency count and select the first row

library(dplyr)
library(tidyr)
dat %>% 
  pivot_longer(cols = -model_name, values_drop_na = TRUE) %>% 
  mutate(value1 = case_when(name %in% c("R^2", "ccc")~ value * -1, 
     TRUE ~ value)) %>% 
  group_by(name) %>% 
  slice_min(n = 1, value1) %>%
  ungroup %>%
  count(model_name, sort = TRUE) %>%
  slice_head(n = 1)

-output

# A tibble: 1 × 2
  model_name      n
  <chr>       <int>
1 XGBoost-reg     2

Or do the summarise to select the model_name from the numeric columns based on the min/max index and then get the count after converting to ‘long’ format

dat %>% 
  summarise(across(where(is.numeric), 
  ~ if(cur_column() %in% c("R^2", "ccc")) 
   model_name[which.max(.x)] else model_name[which.min(.x)])) %>% 
  pivot_longer(cols = everything(), names_to = NULL) %>% 
  count(value, sort = TRUE) %>%
  slice_head(n = 1)

-output

# A tibble: 1 × 2
  value           n
  <chr>       <int>
1 XGBoost-reg     2

Or with base R

names(which.max(table(dat$model_name[max.col(t(replace(dat[-1], 
   is.na(dat[-1]), -Inf) * list(-1, -1, 1, 1)), 'first')])))
[1] "XGBoost-reg"

voting

byMR

Published January 23, 2023

Add a comment

golang ioutil.ReadAll / ioutil.ReadFile / ioutil.ReadDir deprecated

byMR

January 23, 2023

Questions

Recursively transform JSON using jolt

byMR

January 23, 2023

Questions

Overwrite css classes in antd select

byMR

January 23, 2023

Questions

PHP Fatal error: Uncaught Error: Undefined constant "monthly_orders" – in functions.php WordPress php 8 update

byMR

January 23, 2023

Questions

Visual Studio code activity bar – Missing changed files tab

byMR

January 23, 2023

Questions

Split dataframe and plot subsets with a for loop in jupyter notebook

byMR

January 23, 2023

How to perform majority voting from a data frame with ranking criteria

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

golang ioutil.ReadAll / ioutil.ReadFile / ioutil.ReadDir deprecated

Recursively transform JSON using jolt

Overwrite css classes in antd select

PHP Fatal error: Uncaught Error: Undefined constant "monthly_orders" – in functions.php WordPress php 8 update

Visual Studio code activity bar – Missing changed files tab

Split dataframe and plot subsets with a for loop in jupyter notebook

Keep Up to Date with the Most Important News

How to perform majority voting from a data frame with ranking criteria

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

golang ioutil.ReadAll / ioutil.ReadFile / ioutil.ReadDir deprecated

Recursively transform JSON using jolt

Overwrite css classes in antd select

PHP Fatal error: Uncaught Error: Undefined constant "monthly_orders" – in functions.php WordPress php 8 update

Visual Studio code activity bar – Missing changed files tab

Split dataframe and plot subsets with a for loop in jupyter notebook

Discover more from Dev solutions