Replace all subsequent column values, after the first instance of a value greater than x

I have a dataframe (df1) with two columns, one (grp) is a grouping variable, the second (num) has some measurements.

For each group I want to:

  1. replace all numbers greater than 3.5 with 4
  2. replace all numbers after the first instance of 4 with 4

I just want to get to step 2, but step 1 seems like a logical starting point, maybe it isn’t required though?

Example data

 library(dplyr)
    
 df1 <- data.frame(
  grp = rep(c("a", "b"), each = 10),
  num = c(0,1,2,5,0,1,7,0,2,1,2,2,2,2,5,0,0,0,0,6))

I can get the first part:

df1 %>%
  group_by(grp) %>%
  mutate(num = ifelse(num > 3.5, 4, num))

For the second part I tried using dplyr::lag and dplyr::case_when but no luck. Here is the desired output:

   grp num
1    a   0
2    a   1
3    a   2
4    a   4
5    a   4
6    a   4
7    a   4
8    a   4
9    a   4
10   a   4
11   b   2
12   b   2
13   b   2
14   b   2
15   b   4
16   b   4
17   b   4
18   b   4
19   b   4
20   b   4

Any advice would be much appreciated.

>Solution :

You could use cumany() to find all cases after the first event, i.e. num > 3.5.

library(dplyr)

df1 %>%
  group_by(grp) %>%
  mutate(num2 = replace(num, cumany(num > 3.5), 4)) %>%
  ungroup()

# A tibble: 20 × 3
   grp     num  num2
   <chr> <dbl> <dbl>
 1 a         0     0
 2 a         1     1
 3 a         2     2
 4 a         5     4
 5 a         0     4
 6 a         1     4
 7 a         7     4
 8 a         0     4
 9 a         2     4
10 a         1     4
11 b         2     2
12 b         2     2
13 b         2     2
14 b         2     2
15 b         5     4
16 b         0     4
17 b         0     4
18 b         0     4
19 b         0     4
20 b         6     4

You can also replace cumany(num > 3.5) with cumsum(num > 3.5) > 0.

Leave a Reply