Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to take measures of 4 rows sequentially for multiples variables

I`m struggling on how can I take the measure of 4 rows sequentially for multiples variables in my dataframe sorting by the date. Here is a toy example

test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C",  "A", "A","A","A","A","A" , "C"),
                  measure1 = c(10:24),
                  measure2 = c(1:15),
                  time= c("20-09-2020", "25-09-2020", "19-09-2020", "20-05-2020", "20-06-2021", 
                          "11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021",
                          "20-03-2021", "20-10-2021", "29-06-2021", "20-07-2021", "13-06-2021"))
#    my_groups measure1 measure2       time
# 1          A       18        9 15-01-2021
# 2          A       19       10 15-01-2021
# 3          A       12        3 19-09-2020
# 4          A       20       11 20-03-2021
# 5          A       23       14 20-07-2021
# 6          A       10        1 20-09-2020
# 7          A       21       12 20-10-2021
# 8          A       11        2 25-09-2020
# 9          A       22       13 29-06-2021
# 10         B       13        4 20-05-2020
# 11         B       14        5 20-06-2021
# 12         C       15        6 11-01-2021
# 13         C       16        7 13-01-2021
# 14         C       17        8 13-01-2021
# 15         C       24       15 13-06-2021

As result I’d like something like:

#    my_groups measure1    measure2    time
# 1          A       17.25     8.25      20-03-2021  #mean for the first 4 elements of A and maximun date
# 2          A       16.25     7.25      25-09-2020  #mean for the others 4 elements of A and maximun date
# 3          C       18        9         13-06-2021  #mean for the first 4 elements of B and maximun date

To get this result I thought of using something like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

test %>% 
  arrange( my_groups,time) %>% 
  group_by(my_groups) %>% 
  summarise(measure1 = mean(measure1),
            measure2 = mean(measure2), 
            time = max(time))

But I’m having problem to find a solution to take these measures for 4 values consecutively.

Any hint on how can I do that?

>Solution :

We can add another grouping variable g4 using integer division to ensure that we get groups of 4, and if desired, drop groups with fewer members. Then just run your summarize.

library(dplyr)

test %>% 
  arrange(my_groups, time) %>% 
  group_by(my_groups) %>% 
  mutate(g4 = (row_number() - 1) %/% 4) %>%
  group_by(my_groups, g4) %>%
  filter(n() == 4) %>%
  summarise(measure1 = mean(measure1),
            measure2 = mean(measure2), 
            time = max(time),
            .groups = "drop")
#> # A tibble: 3 × 5
#>   my_groups    g4 measure1 measure2 time      
#>   <chr>     <dbl>    <dbl>    <dbl> <chr>     
#> 1 A             0     17.2     8.25 20-03-2021
#> 2 A             1     16.2     7.25 25-09-2020
#> 3 C             0     18       9    13-06-2021
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading