I am looking for a way to use syntax of group_split() or summarise() while preserving original group information. I’ve seen some previous pages like here and here using the approaches but they don’t preserve the grouping information. Is there a way to do this? I could of course join data but was hoping to avoid using that approach.
> set.seed(22)
> # Create fake data
> flavor <- data.frame(
+ temperature = sample(x = c('hot','cold'), size = 500, replace = TRUE),
+ color = sample(c('red','blue','green'), 500, TRUE),
+ texture = sample(c('crumbly', 'crispy', 'wet', 'soft'), 500, TRUE),
+ flavor = sample.int(n = 100, size = 500, replace = TRUE)
+ )
>
> head(flavor, 10)
temperature color texture flavor
1 cold red soft 47
2 hot red crumbly 2
3 cold blue crispy 28
4 cold blue soft 36
5 cold blue crumbly 69
6 cold red soft 49
7 cold blue soft 100
8 hot blue crumbly 42
9 hot blue soft 93
10 hot green wet 47
Using base split + map (works but doesn’t preserve original group information)
> flavor %>%
+ group_by(color, texture) %>%
+ mutate(subsets = cur_group_id()) %>%
+ ungroup() %>%
+ base::split(.$subsets) %>%
+ purrr::map(~ wilcox.test(flavor ~ temperature, data = .)) %>%
+ purrr::map_dfr(~ broom::tidy(.))
# A tibble: 12 Ă— 4
statistic p.value method alternative
<dbl> <dbl> <chr> <chr>
1 237 0.687 Wilcoxon rank sum test with continuity correction two.sided
2 152. 0.866 Wilcoxon rank sum test with continuity correction two.sided
3 236. 0.696 Wilcoxon rank sum test with continuity correction two.sided
4 308 0.216 Wilcoxon rank sum test with continuity correction two.sided
5 256 0.281 Wilcoxon rank sum test with continuity correction two.sided
6 122 0.540 Wilcoxon rank sum test with continuity correction two.sided
7 244 0.742 Wilcoxon rank sum test with continuity correction two.sided
8 130. 0.0393 Wilcoxon rank sum test with continuity correction two.sided
9 238. 0.317 Wilcoxon rank sum test with continuity correction two.sided
10 360. 0.345 Wilcoxon rank sum test with continuity correction two.sided
11 75 0.0292 Wilcoxon rank sum test with continuity correction two.sided
12 219 0.149 Wilcoxon rank sum test with continuity correction two.sided
There were 12 warnings (use warnings() to see them)
Using summarise like approach? (preserves group information but the statistic is incorrect)
> flavor %>%
+ group_by(color, texture) %>%
+ summarise(output = wilcox.test(flavor ~ temperature, data = .) %>% broom::tidy())
`summarise()` has grouped output by 'color'. You can override using the `.groups` argument.
# A tibble: 12 Ă— 3
# Groups: color [3]
color texture output$statistic $p.value $method $alternative
<chr> <chr> <dbl> <dbl> <chr> <chr>
1 blue crispy 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
2 blue crumbly 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
3 blue soft 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
4 blue wet 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
5 green crispy 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
6 green crumbly 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
7 green soft 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
8 green wet 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
9 red crispy 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
10 red crumbly 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
11 red soft 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
12 red wet 30656. 0.721 Wilcoxon rank sum test with continuity correction two.sided
Using group_split (same problem as first)
> flavor %>%
+ group_split(color, texture) %>%
+ map_dfr(~wilcox.test(flavor ~ temperature, data = .) %>% broom::tidy())
# A tibble: 12 Ă— 4
statistic p.value method alternative
<dbl> <dbl> <chr> <chr>
1 237 0.687 Wilcoxon rank sum test with continuity correction two.sided
2 152. 0.866 Wilcoxon rank sum test with continuity correction two.sided
3 236. 0.696 Wilcoxon rank sum test with continuity correction two.sided
4 308 0.216 Wilcoxon rank sum test with continuity correction two.sided
5 256 0.281 Wilcoxon rank sum test with continuity correction two.sided
6 122 0.540 Wilcoxon rank sum test with continuity correction two.sided
7 244 0.742 Wilcoxon rank sum test with continuity correction two.sided
8 130. 0.0393 Wilcoxon rank sum test with continuity correction two.sided
9 238. 0.317 Wilcoxon rank sum test with continuity correction two.sided
10 360. 0.345 Wilcoxon rank sum test with continuity correction two.sided
11 75 0.0292 Wilcoxon rank sum test with continuity correction two.sided
12 219 0.149 Wilcoxon rank sum test with continuity correction two.sided
>Solution :
You could use the broom package to get tidy results, aided by a bit of nesting / unnesting
library(tidyverse)
library(broom)
flavor %>%
nest(data = c(-color, -texture)) %>%
mutate(data = map(data, ~ wilcox.test(flavor ~ temperature, data = .x)),
data = map(data, tidy)) %>%
unnest(data)
#> # A tibble: 12 x 6
#> color texture statistic p.value method alter~1
#> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 blue crumbly 157 0.936 Wilcoxon rank sum test with continui~ two.si~
#> 2 red crispy 242. 0.440 Wilcoxon rank sum test with continui~ two.si~
#> 3 red crumbly 137 0.609 Wilcoxon rank sum test with continui~ two.si~
#> 4 blue crispy 409 0.761 Wilcoxon rank sum test with continui~ two.si~
#> 5 green wet 132. 0.248 Wilcoxon rank sum test with continui~ two.si~
#> 6 blue soft 228. 0.454 Wilcoxon rank sum test with continui~ two.si~
#> 7 blue wet 209 0.404 Wilcoxon rank sum test with continui~ two.si~
#> 8 red soft 230. 0.672 Wilcoxon rank sum test with continui~ two.si~
#> 9 green soft 141 0.0808 Wilcoxon rank sum test with continui~ two.si~
#> 10 green crispy 226. 0.178 Wilcoxon rank sum test with continui~ two.si~
#> 11 red wet 146. 0.0301 Wilcoxon rank sum test with continui~ two.si~
#> 12 green crumbly 164. 0.533 Wilcoxon rank sum test with continui~ two.si~
#> # ... with abbreviated variable name 1: alternative
Created on 2022-09-05 with reprex v2.0.2