Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Group random sequences in R

I have the following sequence in df (dput below):

> df
   value
1     -2
2     -1
3      0
4      1
5      2
6     -3
7     -2
8     -1
9      0
10     1
11    -1
12     0
13     1
14   -10
15    -9
16    -8
17    -7

The difference between values in a sequence is always +1 with the previous value. So that means the desired output should look like this:

   value group
1     -2     1
2     -1     1
3      0     1
4      1     1
5      2     1
6     -3     2
7     -2     2
8     -1     2
9      0     2
10     1     2
11    -1     3
12     0     3
13     1     3
14   -10     4
15    -9     4
16    -8     4
17    -7     4 

As you can see the first sequence is -2,-1,0,1,2 and then next value is -3 which starts with a new sequence. I tried the following code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(dplyr)
df %>% 
  group_by(grp = cumsum(coalesce(value == -lag(value, n = 1), TRUE)))
#> # A tibble: 17 × 2
#> # Groups:   grp [2]
#>    value   grp
#>    <dbl> <int>
#>  1    -2     1
#>  2    -1     1
#>  3     0     1
#>  4     1     1
#>  5     2     1
#>  6    -3     1
#>  7    -2     1
#>  8    -1     1
#>  9     0     1
#> 10     1     1
#> 11    -1     2
#> 12     0     2
#> 13     1     2
#> 14   -10     2
#> 15    -9     2
#> 16    -8     2
#> 17    -7     2

Created on 2023-01-23 with reprex v2.0.2

Which doesn’t work because of the random shifts between sequences. So I was wondering if anyone knows how to group these random sequences?


dput of df:

df<-structure(list(value = c(-2, -1, 0, 1, 2, -3, -2, -1, 0, 1, -1, 
0, 1, -10, -9, -8, -7)), class = "data.frame", row.names = c(NA, 
-17L))

>Solution :

Edit: no need for abs if the sequence is always in the same direction.


You want to look for values with an absolute difference different from 1:

library(dplyr)
df %>% 
  group_by(grp = cumsum(c(TRUE, abs(diff(df$value)) != 1)))

Or with lag:

df %>% 
  group_by(grp = cumsum(abs((value - lag(value, default = TRUE))) != 1))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading