Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: How to loop over a name-based selection of variables from a dataframe and for each create a new variable containing the column mean of the first?

I have a dataset containing a number of numeric variables whose names all start with "Ranking". For each of these variables, I want to add another variable to the dataset that contains the column mean of the first variable.

So the data look something like this:

| Ranking_blah | Ranking_bleh | 

| --------     | ----------   |

| 1            | 0            |

| 0            | 1            |

| NA           | 0.5          |

and what I want is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

| Ranking_blah | Ranking_bleh | Ranking_blah_mean | Ranking_bleh_mean |

| --------     | ----------   |----------------   |----------------|

| 1            | 0            | 0                 | 0.5            |

| -1           | 1            | 0                 | 0.5            |

| NA           | 0.5          | 0                 | 0.5    

(I am aware this way the mean variables have the same values in all rows, respectively – I need this because the data will be reshaped later)

What I’ve tried so far:

#getting a list of all ranking variables I want to create a new mean variable from

ranking_variables = names(data)[grepl("Ranking", names(data))]

#creating a new variable for each base variable in the list and setting it to the mean of the respective base variable

data[paste0(ranking_variables, "_mean")] <- do.call(cbind, lapply(data[ranking_variables], function(x) mean(x, na.rm = TRUE)))

The second part is not working, though, it only yields NA values. What am I doing wrong?

>Solution :

An alternative approach is to use dplyr‘s across:

dat |>
    mutate(across(starts_with("Ranking"), ~ mean(., na.rm = TRUE), .names = "{.col}_mean"))

Output:

# A tibble: 3 × 4
  Ranking_blah Ranking_bleh Ranking_blah_mean Ranking_bleh_mean
         <dbl>        <dbl>             <dbl>             <dbl>
1            1          0                   0               0.5
2           -1          1                   0               0.5
3           NA          0.5                 0               0.5

Data:

tibble(Ranking_blah = c(1,-1,NA), Ranking_bleh = c(0,1,0.5))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading