Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R data.table join to find last row satisfying condition

I’d like to get the last row satisfying a condition in a data.table.

For this I need to use a join because this is by far the fastest way.

mydt <- data.table(condition = c(F,F,T,F,F,T,F,F,F,F), row = 1:10).
> mydt
    condition   row
       <lgcl> <int>
 1:     FALSE     1
 2:     FALSE     2
 3:      TRUE     3
 4:     FALSE     4
 5:     FALSE     5
 6:      TRUE     6
 7:     FALSE     7
 8:     FALSE     8
 9:     FALSE     9
10:     FALSE    10

Basically I’d like the previous row where condition is TRUE. Here the sixth element should be 3, all the rest NA’s.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried to compute this over both TRUE and FALSE because for some reason I can’t use on = .(condition == T)

mydt[
  mydt,
  on = .(condition == condition, row < row),
  .(result = row),
  mult = "last"]$result

# [1]  1  2  3  4  5  6  7  8  9 10
# expected result: NA, 1, NA, 2, 4, 3, 5, 6, 7, 8, 9
# OR expected result (only for TRUE): NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA

Any help? thanks


EDIT The below accomplishes the expected result in dplyr but I am still after a data.table join solution

mydt %>% as.data.frame() %>% group_by(condition) %>% mutate(prev_result = lag(row))
   condition   row prev_result
   <lgl>     <int>       <int>
 1 FALSE         1          NA
 2 FALSE         2           1
 3 TRUE          3          NA
 4 FALSE         4           2
 5 FALSE         5           4
 6 TRUE          6           3
 7 FALSE         7           5
 8 FALSE         8           7
 9 FALSE         9           8
10 FALSE        10           9

>Solution :

You’ll want to keep the x.row in the result. By default you seem to get
the i.row instead.

library(data.table)

mydt <- data.table(condition = c(F, F, T, F, F, T, F, F, F, F), row = 1:10)

mydt[
  mydt,
  on = .(condition == condition, row < row),
  mult = "last",
  .(condition, i.row, x.row)
]
#>     condition i.row x.row
#>  1:     FALSE     1    NA
#>  2:     FALSE     2     1
#>  3:      TRUE     3    NA
#>  4:     FALSE     4     2
#>  5:     FALSE     5     4
#>  6:      TRUE     6     3
#>  7:     FALSE     7     5
#>  8:     FALSE     8     7
#>  9:     FALSE     9     8
#> 10:     FALSE    10     9
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading