Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

grab some columns and calculate the proportion inr

id=1:5
age1=c(67,39,97,55,37)
age2=c(300,122,333,70,333)
age3=c(1,3,6,1,3)
age4=c(56,33,34,77,99)
gender=c("f","m","f","f","m")
data=data.frame(id, age1, age2, age3, age4, gender)

length(data$age1[data$age1 > 50])/length(data$age1)
length(data$age2[data$age2 > 50])/length(data$age2)
length(data$age3[data$age3 > 50])/length(data$age3)
length(data$age4[data$age4 > 50])/length(data$age4)

First, I want to grab the age columns (age1, age2, age3, age4) using %in% operator (grab the columns whose name has age in it)

and then, I want to calculate the proportion-
but my code seems to be inefficient.
This is a reproducible example, and in my data, I have different 30 ages…

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

A base solution with grep() to extract column names containing "age":

colMeans(data[grep("age", names(data))] > 50)

# age1 age2 age3 age4
#  0.6  1.0  0.0  0.6

You can also use summarise() with across() from dplyr.

library(dplyr)

data %>%
  summarise(across(contains("age"), ~ mean(.x > 50)))

#   age1 age2 age3 age4
# 1  0.6    1    0  0.6

Hint: You can use mean() to get the proportion of TRUE of a logical vector.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading