Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Proportion Tables in R

I have the following data in R:

gender <- c("Male","Female")

gender <- sample(gender, 5000, replace=TRUE, prob=c(0.45, 0.55))

gender <- as.factor(gender)

disease <- c("Yes","No")

disease <- sample(disease, 5000, replace=TRUE, prob=c(0.4, 0.6))

disease <- as.factor(disease)

status <- c("Immigrant","Citizen")

status <- sample(status, 5000, replace=TRUE, prob=c(0.3, 0.7))

status  <- as.factor(status )

my_data = data.frame(gender, status, disease)

I want to make a table that shows:

  • What percent of male immigrants have the disease?
  • What percent of male non-immigrants have the disease?
  • What percent of female immigrants have the disease?
  • What percent of female non-immigrants have the disease?

I tried to do this with the following code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 t1 <- xtabs(disease ~ gender + status, data=my_data)

But I get this error:
Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  : 
  ‘sum’ not meaningful for factors

Can someone please show me what I am doing wrong and how to fix this?

Thank you!

>Solution :

As there are more columns and all of them are factors, use count from dplyr and then get the proportions

library(dplyr)
library(tidyr)
my_data %>% 
   dplyr::count(across(everything())) %>% 
   pivot_wider(names_from = disease, values_from =n, values_fill = 0) %>% 
   group_by(gender) %>% 
   mutate(100 *across(No:Yes, proportions)) %>% 
   ungroup

-output

# A tibble: 4 × 4
  gender status       No   Yes
  <fct>  <fct>     <dbl> <dbl>
1 Female Citizen    69.4  72.4
2 Female Immigrant  30.6  27.6
3 Male   Citizen    70.4  68.7
4 Male   Immigrant  29.6  31.3

With xtabs, if we convert the column to integer, it could work as

apply(xtabs(n ~ disease + gender + status, 
  transform(my_data, n = as.integer(disease))), c(1, 2), proportions) * 100
, , gender = Female

           disease
status            No      Yes
  Citizen   69.36724 72.41993
  Immigrant 30.63276 27.58007

, , gender = Male

           disease
status            No      Yes
  Citizen   70.40185 68.68687
  Immigrant 29.59815 31.31313
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading