Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Tidy way to calculate wilcoxon test on multiple group splits and preserve original group information

I am looking for a way to use syntax of group_split() or summarise() while preserving original group information. I’ve seen some previous pages like here and here using the approaches but they don’t preserve the grouping information. Is there a way to do this? I could of course join data but was hoping to avoid using that approach.

> set.seed(22)
> # Create fake data
> flavor <- data.frame(
+   temperature = sample(x = c('hot','cold'), size = 500, replace = TRUE),
+   color = sample(c('red','blue','green'), 500, TRUE),
+   texture = sample(c('crumbly', 'crispy', 'wet', 'soft'), 500, TRUE),
+   flavor = sample.int(n = 100, size = 500, replace = TRUE)
+ )
> 
> head(flavor, 10)
   temperature color texture flavor
1         cold   red    soft     47
2          hot   red crumbly      2
3         cold  blue  crispy     28
4         cold  blue    soft     36
5         cold  blue crumbly     69
6         cold   red    soft     49
7         cold  blue    soft    100
8          hot  blue crumbly     42
9          hot  blue    soft     93
10         hot green     wet     47

Using base split + map (works but doesn’t preserve original group information)

> flavor %>%
+   group_by(color, texture) %>%
+   mutate(subsets = cur_group_id()) %>%
+   ungroup() %>%
+   base::split(.$subsets) %>%
+   purrr::map(~ wilcox.test(flavor ~ temperature, data = .)) %>%
+   purrr::map_dfr(~ broom::tidy(.))
# A tibble: 12 Ă— 4
   statistic p.value method                                            alternative
       <dbl>   <dbl> <chr>                                             <chr>      
 1      237   0.687  Wilcoxon rank sum test with continuity correction two.sided  
 2      152.  0.866  Wilcoxon rank sum test with continuity correction two.sided  
 3      236.  0.696  Wilcoxon rank sum test with continuity correction two.sided  
 4      308   0.216  Wilcoxon rank sum test with continuity correction two.sided  
 5      256   0.281  Wilcoxon rank sum test with continuity correction two.sided  
 6      122   0.540  Wilcoxon rank sum test with continuity correction two.sided  
 7      244   0.742  Wilcoxon rank sum test with continuity correction two.sided  
 8      130.  0.0393 Wilcoxon rank sum test with continuity correction two.sided  
 9      238.  0.317  Wilcoxon rank sum test with continuity correction two.sided  
10      360.  0.345  Wilcoxon rank sum test with continuity correction two.sided  
11       75   0.0292 Wilcoxon rank sum test with continuity correction two.sided  
12      219   0.149  Wilcoxon rank sum test with continuity correction two.sided  
There were 12 warnings (use warnings() to see them)

Using summarise like approach? (preserves group information but the statistic is incorrect)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> flavor %>%
+   group_by(color, texture) %>%
+   summarise(output = wilcox.test(flavor ~ temperature, data = .) %>% broom::tidy())
`summarise()` has grouped output by 'color'. You can override using the `.groups` argument.
# A tibble: 12 Ă— 3
# Groups:   color [3]
   color texture output$statistic $p.value $method                                           $alternative
   <chr> <chr>              <dbl>    <dbl> <chr>                                             <chr>       
 1 blue  crispy            30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 2 blue  crumbly           30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 3 blue  soft              30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 4 blue  wet               30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 5 green crispy            30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 6 green crumbly           30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 7 green soft              30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 8 green wet               30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
 9 red   crispy            30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
10 red   crumbly           30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
11 red   soft              30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   
12 red   wet               30656.    0.721 Wilcoxon rank sum test with continuity correction two.sided   

Using group_split (same problem as first)

> flavor %>%
+   group_split(color, texture) %>%
+   map_dfr(~wilcox.test(flavor ~ temperature, data = .) %>% broom::tidy())
# A tibble: 12 Ă— 4
   statistic p.value method                                            alternative
       <dbl>   <dbl> <chr>                                             <chr>      
 1      237   0.687  Wilcoxon rank sum test with continuity correction two.sided  
 2      152.  0.866  Wilcoxon rank sum test with continuity correction two.sided  
 3      236.  0.696  Wilcoxon rank sum test with continuity correction two.sided  
 4      308   0.216  Wilcoxon rank sum test with continuity correction two.sided  
 5      256   0.281  Wilcoxon rank sum test with continuity correction two.sided  
 6      122   0.540  Wilcoxon rank sum test with continuity correction two.sided  
 7      244   0.742  Wilcoxon rank sum test with continuity correction two.sided  
 8      130.  0.0393 Wilcoxon rank sum test with continuity correction two.sided  
 9      238.  0.317  Wilcoxon rank sum test with continuity correction two.sided  
10      360.  0.345  Wilcoxon rank sum test with continuity correction two.sided  
11       75   0.0292 Wilcoxon rank sum test with continuity correction two.sided  
12      219   0.149  Wilcoxon rank sum test with continuity correction two.sided  

>Solution :

You could use the broom package to get tidy results, aided by a bit of nesting / unnesting

library(tidyverse)
library(broom)

flavor %>%
  nest(data = c(-color, -texture)) %>%
  mutate(data = map(data, ~ wilcox.test(flavor ~ temperature, data = .x)),
         data = map(data, tidy)) %>% 
  unnest(data)
#> # A tibble: 12 x 6
#>    color texture statistic p.value method                                alter~1
#>    <chr> <chr>       <dbl>   <dbl> <chr>                                 <chr>  
#>  1 blue  crumbly      157   0.936  Wilcoxon rank sum test with continui~ two.si~
#>  2 red   crispy       242.  0.440  Wilcoxon rank sum test with continui~ two.si~
#>  3 red   crumbly      137   0.609  Wilcoxon rank sum test with continui~ two.si~
#>  4 blue  crispy       409   0.761  Wilcoxon rank sum test with continui~ two.si~
#>  5 green wet          132.  0.248  Wilcoxon rank sum test with continui~ two.si~
#>  6 blue  soft         228.  0.454  Wilcoxon rank sum test with continui~ two.si~
#>  7 blue  wet          209   0.404  Wilcoxon rank sum test with continui~ two.si~
#>  8 red   soft         230.  0.672  Wilcoxon rank sum test with continui~ two.si~
#>  9 green soft         141   0.0808 Wilcoxon rank sum test with continui~ two.si~
#> 10 green crispy       226.  0.178  Wilcoxon rank sum test with continui~ two.si~
#> 11 red   wet          146.  0.0301 Wilcoxon rank sum test with continui~ two.si~
#> 12 green crumbly      164.  0.533  Wilcoxon rank sum test with continui~ two.si~
#> # ... with abbreviated variable name 1: alternative

Created on 2022-09-05 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading