Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

tidyverse- Is pivot_wider() only way to summarize selecting specific row values?

I need to summarize an index of testing results from tidy data. For each group, I need to do a weighted sum of specific values to return a index value. I’m used to using group_by() and summarise() and to subset with the format df$value[var==’A’], but I can’t get that way to work. I can only get pivot_wider() to work.

#reprex
library(tidyverse)
#sample data
df <- data.frame(group = c('foo', 'foo', 'foo', 'foo','bar', 'bar', 'bar', 'bar'), 
                 var = c('a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'), 
                 result = c(1, 6, 9, 3, 5, 0, 2, 9))

#this does not work, nor does using 'reframe()' as suggested by error
index <- df %>% 
  group_by(group) %>% 
  summarise(var = 'index', 
            result = result[var=='b']/2 + result[var=='d']/3)
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> i Please use `reframe()` instead.
#> i When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> `summarise()` has grouped output by 'group'. You can override using the
#> `.groups` argument.

#using pivot_wider works, is this the only way?
index <- df %>% 
  filter(var %in% c('b', 'd')) %>% 
  pivot_wider(names_from = var, values_from = result) %>% 
  mutate(index = b/2 + d/3) %>% 
  pivot_longer(cols = c('b', 'd', 'index'), 
               names_to = 'var', 
               values_to = 'result')

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The problem is that you made var='index' first, then all the subsequent calculations using var will be wrong. If you change the order of result and var in your summarise() statement, it works:

library(tidyverse)
#sample data
df <- data.frame(group = c('foo', 'foo', 'foo', 'foo','bar', 'bar', 'bar', 'bar'), 
                 var = c('a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'), 
                 result = c(1, 6, 9, 3, 5, 0, 2, 9))


index <- df %>% 
  group_by(group) %>% 
  summarise(result = result[var=='b']/2 + result[var=='d']/3, 
            var = 'index')
index
#> # A tibble: 2 × 3
#>   group result var  
#>   <chr>  <dbl> <chr>
#> 1 bar        3 index
#> 2 foo        4 index

Created on 2023-04-14 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading