Calculate mean and sd for given variables in a dataframe

Advertisements

Given a vector of names of numeric variables in a dataframe, I need to calculate mean and sd for each variable. For example, given the mtcars dataset and the following vector of variable names:

vars_to_transform <- c("mpg", "disp")

I’d like to have the following as result:

The first solution that came into my mind is the following:

library(dplyr)
library(purrr)

data("mtcars")

vars_to_transform <- c("mpg", "disp")

vars_to_transform %>% 
  map_dfr( function(x) { c(variable = x, avg = mean(mtcars[[x]], na.rm = T), sd = sd(mtcars[[x]], na.rm = T)) } )

The result is the following:

As you can see, all the returned variables are characters, but I expected to have numbers for avg and sd.

Is there a way to fix this? Or is there any better solution than this?

P.S.
I’m using purr 0.3.4

>Solution :

Seems like an overcomplicated way of doing select->pivot->group->summarise.

mtcars %>% 
    select(all_of(vars_to_transform)) %>%
    pivot_longer(everything()) %>% 
    group_by(name) %>% 
    summarise(
        mean = mean(value),
        sd = sd(value)
    )
# A tibble: 2 x 3
  name   mean     sd
  <chr> <dbl>  <dbl>
1 disp  231.  124.  
2 mpg    20.1   6.03

Leave a Reply Cancel reply