Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Summarize with arithmetic operations on rows by column entry

library(tidyverse)
set.seed(1)
start <- mdy("01/01/2022")
end <- start + as.difftime(4, units = "days")
days <- seq(from = start, to = end, by = 1)
days <- sample(days, 100, replace = T)
flip <- sample(c("Heads", "Tails"), 100, replace = TRUE)
numbers <- rchisq(100, 30)

df <- tibble(days, numbers, flip)

I have this dataframe and would like to summarize it for each day by taking the total sum of numbers for each flip and creating this ratio, (Heads-Tails)/(Heads+Tails) below is an example for a particular day:

df <- df %>% group_by(days, flip) %>%
  summarize(total = sum(numbers)) %>%
  ungroup()

> (df[[1,3]]-df[[2,3]])/(df[[1,3]]+df[[2,3]])
[1] -0.4164241
> (df[[3,3]]-df[[4,3]])/(df[[3,3]]+df[[4,3]])
[1] 0.06134484
> (df[[5,3]]-df[[6,3]])/(df[[5,3]]+df[[6,3]])
[1] 0.6067984
> (df[[7,3]]-df[[8,3]])/(df[[7,3]]+df[[8,3]])
[1] -0.2603271
> (df[[9,3]]-df[[10,3]])/(df[[9,3]]+df[[10,3]])
[1] 0.309745

These are the ratios I would like to end up with for every day.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Starting from the first df,

df %>%
  group_by(days, flip) %>%
  summarize(total = sum(numbers)) %>%
  group_by(days) %>%
  summarize(z = -diff(total)/sum(total))
# # A tibble: 5 × 2
#   days             z
#   <date>       <dbl>
# 1 2022-01-01 -0.416 
# 2 2022-01-02  0.0613
# 3 2022-01-03  0.607 
# 4 2022-01-04 -0.260 
# 5 2022-01-05  0.310 

though this relies on the order of flip. To be a little more resilient to that, we can do:

... %>%
  summarize(z = (total[flip == "Heads"] - total[flip == "Tails"]) / sum(total))

We can also try a pivoting approach:

df %>%
  group_by(days, flip) %>%
  summarize(total = sum(numbers)) %>%
  ungroup() %>%
  pivot_wider(id_cols = days, names_from = flip, values_from = total) %>%
  mutate(z = (Heads - Tails) / (Heads + Tails))
# # A tibble: 5 × 4
#   days       Heads Tails       z
#   <date>     <dbl> <dbl>   <dbl>
# 1 2022-01-01  205. 498.  -0.416 
# 2 2022-01-02  315. 279.   0.0613
# 3 2022-01-03  363.  88.8  0.607 
# 4 2022-01-04  227. 386.  -0.260 
# 5 2022-01-05  403. 212.   0.310 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading