Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Computing statistical analysis (Mann-whitney) multiple times in a data frame

I have a data frame that looks like this, say this is called DF.

Subject Avg_Score EOTM FSO_1
Joseph 1.09 1.20 6.1
Joseph 0.89 1.90 6.8
Joseph 0.99 0.80 8.2
Joseph (B) 0.76 0.80 8.9
Joseph (B) 1.23 0.10 21.1
Joseph (B) 1.11 0.22 26.1
Susie 1.8 11.20 60.1
Susie 1.9 10.90 63.8
Susie 1.4 10.80 81.2
Susie (B) 1.1 10.80 84.9
Susie (B) 1.2 12.10 71.1
Susie (B) 1.4 11.22 76.1

I want to perform a Mann-Whitney test between each subject and the subject’s baseline (Base) in each category. For example, do a Mann-Whitney test for Joseph and Joseph (Base) for Avg_Score, EOTM, and FSO_1 separately so I get 3 p-values for the direct comparison between the two. My end goal is to essentially make a final data frame, DF2 like this:

Subject Avg_Score EOTM FSO_1
Joseph p-val p-val p-val
Susie p-val p-val p-val

Where [p-val] is the resulted p-value between the regular subject name and subject name (Base). (E.g. p-val for [p-value] for Avg_Score with Joseph is a whitney test comparing Joseph Avg_Score vs Joseph (Base) Avg_Score.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

To do the mann-whitney test, I can use wilcox.test command. But in a large data set that have more than the rows/columns listed here, how could I make that perhaps as a for loop, if necessary? I would appreciate any help thank you. An example of the wilcox test is here.

Subject <- c("Joseph", "Joseph", "Joseph", " Joseph (B)", " Joseph (B)", " Joseph (B)", " Susie", "Susie", "Susie", "Susie (B)", "Susie (B)", "Susie (B)")
Avg_Score <- c(1.09, 0.89, 0.99, 0.76, 1.23, 1.11, 1.88, 1.9, 1.4, 1.1, 1.2, 1.4)
EOTM <- c(1.2, 1.9, 0.8, 0.8, 0.1, 0.22, 11.2, 10.9, 10.8, 10.8, 12.1, 11.22)
FS0_1 <- c(6.1, 6.8, 8.2, 8.9, 21.1, 26.1, 60.1, 63.8, 81.2, 84.9, 71.1, 76.1)
DF <- as.data.frame(Subject, Avg_Score, EOTM, FS0_1)

>Solution :

library(dplyr)
library(stringr)
DF %>%
   mutate(Sub = Subject) %>% 
   group_by(Subject = trimws(str_remove(Subject, "\\s+\\(.*"))) %>% 
   summarise(across(where(is.numeric), ~ 
    wilcox.test(.x[str_detect(Sub, "\\(B")], 
      .x[str_detect(Sub, "\\(B", negate = TRUE)])$p.value), .groups = "drop")

-output

# A tibble: 2 × 4
  Subject Avg_Score  EOTM FS0_1
  <chr>       <dbl> <dbl> <dbl>
1 Joseph      0.7   0.121   0.1
2 Susie       0.121 0.507   0.4

base R

by(DF, trimws(DF$Subject, whitespace = "\\s+\\(.*|\\s*"), 
  FUN = \(x) {
    i1 <- grepl("\\(B", x$Subject)
    sapply(x[-1], \(u) wilcox.test(u[i1], u[!i1])$p.value) 
  })
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading