Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add group in column result (in dplyr)

I find myself doing this quite often.

set.seed(123)
test_data=data.frame(sample=sample(LETTERS[1:10], 100,replace = TRUE), type=sample(letters[1:2], 100, replace=TRUE ), area=sample(1:100, replace=TRUE) )

The content of test_data is:

head(test_data)
  sample type area
1      C    b   24
2      C    b   63
3      J    a   54
4      B    b   23
5      F    a   26
6      E    a   33

I usually want to sum a specific column by some grouping properties, and to do that I use:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

res_sum=test_data %>% group_by(sample, type) %>% summarise_at( .vars = "area", .funs = sum )

So far so good, the problem arrives when I want to put the results in a "nice" format.
The lines below put the result in the way I want, but I find this way cumbersome.

res_sum_a=res_sum[res_sum$type=="a", ]
colnames(res_sum_a)[3]=paste0( colnames(res_sum)[3], ".a")

res_sum_b=res_sum[res_sum$type=="b", ]
colnames(res_sum_b)[3]=paste0( colnames(res_sum)[3], ".b")

res_df=merge(res_sum_a[,c(1,3)], res_sum_b[, c(1,3)], by="sample", all=TRUE)

head(res_df)

  sample area.a area.b
1      A    244    147
2      B     17    152
3      C    153    541
4      D    107     94
5      E    246    266
6      F    189    286

Note, there may be more than 2 "type" in the original data frame (so, like a,b,c….).
Is there a way, that is more dplyr idiomatic to do this?
Thanks.

>Solution :

What you’re referring to is pivoting, which is provided by a separate package ({tidyr}).

I also updated your dplyr code to most recent syntax (e.g., summarise_at() has been repalced by across()).

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

set.seed(123)
test_data=data.frame(sample=sample(LETTERS[1:10], 100,replace = TRUE), type=sample(letters[1:2], 100, replace=TRUE ), area=sample(1:100, replace=TRUE) )

test_data |> 
  group_by(sample, type) |> 
  summarise(across(area, sum)) |> 
  tidyr::pivot_wider(names_from = type, values_from = area)
#> `summarise()` has grouped output by 'sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 10 x 3
#> # Groups:   sample [10]
#>    sample     a     b
#>    <chr>  <int> <int>
#>  1 A        244   147
#>  2 B         17   152
#>  3 C        153   541
#>  4 D        107    94
#>  5 E        246   266
#>  6 F        189   286
#>  7 G         48   483
#>  8 H        223    94
#>  9 I        285   345
#> 10 J        491   252

Created on 2022-11-16 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading