Home Computing statistical analysis (Mann-whitney) multiple times in a data frame

Questions

Computing statistical analysis (Mann-whitney) multiple times in a data frame

November 9, 2022

I have a data frame that looks like this, say this is called DF.

Subject	Avg_Score	EOTM	FSO_1
Joseph	1.09	1.20	6.1
Joseph	0.89	1.90	6.8
Joseph	0.99	0.80	8.2
Joseph (B)	0.76	0.80	8.9
Joseph (B)	1.23	0.10	21.1
Joseph (B)	1.11	0.22	26.1
Susie	1.8	11.20	60.1
Susie	1.9	10.90	63.8
Susie	1.4	10.80	81.2
Susie (B)	1.1	10.80	84.9
Susie (B)	1.2	12.10	71.1
Susie (B)	1.4	11.22	76.1

I want to perform a Mann-Whitney test between each subject and the subject’s baseline (Base) in each category. For example, do a Mann-Whitney test for Joseph and Joseph (Base) for Avg_Score, EOTM, and FSO_1 separately so I get 3 p-values for the direct comparison between the two. My end goal is to essentially make a final data frame, DF2 like this:

Subject	Avg_Score	EOTM	FSO_1
Joseph	p-val	p-val	p-val
Susie	p-val	p-val	p-val

Where [p-val] is the resulted p-value between the regular subject name and subject name (Base). (E.g. p-val for [p-value] for Avg_Score with Joseph is a whitney test comparing Joseph Avg_Score vs Joseph (Base) Avg_Score.

To do the mann-whitney test, I can use wilcox.test command. But in a large data set that have more than the rows/columns listed here, how could I make that perhaps as a for loop, if necessary? I would appreciate any help thank you. An example of the wilcox test is here.

Subject <- c("Joseph", "Joseph", "Joseph", " Joseph (B)", " Joseph (B)", " Joseph (B)", " Susie", "Susie", "Susie", "Susie (B)", "Susie (B)", "Susie (B)")
Avg_Score <- c(1.09, 0.89, 0.99, 0.76, 1.23, 1.11, 1.88, 1.9, 1.4, 1.1, 1.2, 1.4)
EOTM <- c(1.2, 1.9, 0.8, 0.8, 0.1, 0.22, 11.2, 10.9, 10.8, 10.8, 12.1, 11.22)
FS0_1 <- c(6.1, 6.8, 8.2, 8.9, 21.1, 26.1, 60.1, 63.8, 81.2, 84.9, 71.1, 76.1)
DF <- as.data.frame(Subject, Avg_Score, EOTM, FS0_1)

>Solution :

library(dplyr)
library(stringr)
DF %>%
   mutate(Sub = Subject) %>% 
   group_by(Subject = trimws(str_remove(Subject, "\\s+\\(.*"))) %>% 
   summarise(across(where(is.numeric), ~ 
    wilcox.test(.x[str_detect(Sub, "\\(B")], 
      .x[str_detect(Sub, "\\(B", negate = TRUE)])$p.value), .groups = "drop")

-output

# A tibble: 2 × 4
  Subject Avg_Score  EOTM FS0_1
  <chr>       <dbl> <dbl> <dbl>
1 Joseph      0.7   0.121   0.1
2 Susie       0.121 0.507   0.4

base R

by(DF, trimws(DF$Subject, whitespace = "\\s+\\(.*|\\s*"), 
  FUN = \(x) {
    i1 <- grepl("\\(B", x$Subject)
    sapply(x[-1], \(u) wilcox.test(u[i1], u[!i1])$p.value) 
  })