Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

subset dplyr dataframe with custom rule

I have a dataframe like the following:

df <- data.frame(num = c(1, 2, 4, 5, 7, 9, 10), value = c('a', 'b', 'c', 'd', 'e', 'f', 'g'))

I would like to subset the dataframe by rows that are continuous (serial) without break. My output should look like the following:

    num value
1     1     a
2     2     b
3     4     c
4     5     d
5     9     f
6    10     g

With the code below,

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_subset = df %>% 
  mutate(difference = num - lag(num, default = first(num))) %>%
  filter(difference ==1 | row_number() ==1)

The output excludes 4 & 9

   num value
1     1     a
2     2     b
3     5     d
4    10     g

because the value of difference is not 1. How to modify this to create the groups with series?

>Solution :

You could use diff twice instead of lags:

df %>%
  filter(c(1, diff(num))==1 | c(diff(num), NA)==1)

  num value
1   1     a
2   2     b
3   4     c
4   5     d
5   9     f
6  10     g
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading