Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Compute stats for several columns at the same time using sapply

I have a dataframe as follows:

# A tibble: 6 x 4
   Placebo    High  Medium      Low
     <dbl>   <dbl>   <dbl>    <dbl>
1  0.0400  -0.04    0.0100  0.0100 
2  0.04     0      -0.0100  0.04   
3  0.0200  -0.1    -0.05   -0.0200 
4  0.03    -0.0200  0.03   -0.00700
5 -0.00500 -0.0100  0.0200  0.0100 
6  0.0300  -0.0100 NA      NA  

You could get the cohensD for two of the columns using the cohen.d() function from the effsize package:

df <- data.frame(Placebo = c(0.0400, 0.04, 0.0200, 0.03, -0.00500, 0.0300),
                 Low = c(-0.04, 0, -0.1, -0.0200,  -0.0100, -0.0100),
                 Medium = c(0.0100, -0.0100, -0.05, 0.03,  0.0200, NA ),
                 High = c(0.0100, 0.04, -0.0200, -0.00700, 0.0100, NA))

library(effsize)
cohen.d(as.vector(na.omit(df$Placebo)), as.vector(na.omit(df$High)))

Interestingly enough, I’m getting the following error with this code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Error in data[, group] : incorrect number of dimensions

However, I would like to create a function that allows you to obtain all the cohensd between one of the columns and the rest of them.

In order to get the cohensD of all columns against the Placebo we would use something like:

sapply(df, function(i) cohen.d(pull(df, as.vector(na.omit(!!Placebo))), as.vector(na.omit(i))))

But I’m not sure this would work anyway.

Edit: I don’t want to erase the full row, as cohens d can be computed for different length vectors. Ideally, I would like to get the stat with the NA removed for each column independetly

>Solution :

It may be better to remove the NA on each of the columns separately by creating a logical index along with ‘Placebo’

library(dplyr)
library(effsize)
df %>%   
  summarise(across(Low:High, ~ list({
             i1 <- complete.cases(Placebo)& complete.cases(.x)
             cohen.d(Placebo[i1], .x[i1])})))

Or if we want to use lapply/sapply, loop over the columns other than Placebo

lapply(df[-1], function(x) {
          x1 <- na.omit(cbind(df$Placebo, x))
          cohen.d(x1[,1], x1[,2])
})

-output

$Low

Cohen's d

d estimate: 1.947312 (large)
95 percent confidence interval:
    lower     upper 
0.3854929 3.5091319 


$Medium

Cohen's d

d estimate: 0.9622504 (large)
95 percent confidence interval:
     lower      upper 
-0.5782851  2.5027860 


$High

Cohen's d

d estimate: 0.8884639 (large)
95 percent confidence interval:
     lower      upper 
-0.6402419  2.4171697 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading