Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculating average rle$lengths over grouped data

I would like to calculate duration of state using rle() on grouped data. Here is test data frame:

DF <- read.table(text="Time,x,y,sugar,state,ID
0,31,21,0.2,0,L0
1,31,21,0.65,0,L0
2,31,21,1.0,0,L0
3,31,21,1.5,1,L0
4,31,21,1.91,1,L0
5,31,21,2.3,1,L0
6,31,21,2.75,0,L0
7,31,21,3.14,0,L0
8,31,22,3.0,2,L0
9,31,22,3.47,1,L0
10,31,22,3.930,0,L0
0,37,1,0.2,0,L1
1,37,1,0.65,0,L1
2,37,1,1.089,0,L1
3,37,1,1.5198,0,L1
4,36,1,1.4197,2,L1
5,36,1,1.869,0,L1
6,36,1,2.3096,0,L1
7,36,1,2.738,0,L1
8,36,1,3.16,0,L1
9,36,1,3.5703,0,L1
10,36,1,3.970,0,L1
", header = TRUE, sep =",")

I want to know the average length for state == 1, grouped by ID. I have created a function inspired by: https://www.reddit.com/r/rstats/comments/brpzo9/tidyverse_groupby_and_rle/
to calculate the rle average portion:

rle_mean_lengths = function(x, value) {
  r = rle(x)
  cond = r$values == value 
  data.frame(count = sum(cond), avg_length = mean(r$lengths[cond]))
}

And then I add in the grouping aspect:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

DF %>% group_by(ID) %>% do(rle_mean_lengths(DF$state,1))

However, the values that are generated are incorrect:

ID count avg_length
1 L0 2 2
2 L1 2 2

L0 is correct, L1 has no instances of state == 1 so the average should be zero or NA.
I isolated the problem in terms of breaking it down into just summarize:

DF %>% group_by(ID) %>% summarize_at(vars(state),list(name=mean)) # This works but if I use summarize it gives me weird values again.

How do I do the equivalent summarize_at() for do()? Or is there another fix? Thanks

>Solution :

As it is a data.frame column, we may need to unnest afterwards

library(dplyr)
library(tidyr)
DF %>% 
 group_by(ID) %>%
  summarise(new = list(rle_mean_lengths(state, 1)), .groups = "drop") %>%
  unnest(new)

Or remove the list and unpack

 DF %>% 
  group_by(ID) %>%
  summarise(new = rle_mean_lengths(state, 1), .groups = "drop") %>% 
  unpack(new)
# A tibble: 2 × 3
  ID    count avg_length
  <chr> <int>      <dbl>
1 L0        2          2
2 L1        0        NaN

In the OP’s do code, the column that should be extracted should be not from the whole data, but from the data coming fromt the lhs i.e. . (Note that do is kind of deprecated. So it may be better to make use of the summarise with unnest/unpack

DF %>% 
  group_by(ID) %>%
  do(rle_mean_lengths(.$state,1))
# A tibble: 2 × 3
# Groups:   ID [2]
  ID    count avg_length
  <chr> <int>      <dbl>
1 L0        2          2
2 L1        0        NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading