Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R data.table only perform operation on group if condition is met

I have a data.table that is supposed to remove all rows per group until a negative number is met in value (including the row with the negative number itself). However, if there is no negative number in value I would like to keep all rows from that group.

# Example data    
group = rep(1:4,each=3)
value = c(1,2,3,1,-2,3,1,2,-3,-1,2,3)
DT = data.table(group,value)

> DT
    group  value row_idx
 1:     1      1       1
 2:     1      2       2
 3:     1      3       3
 4:     2      1       1
 5:     2     -2       2
 6:     2      3       3
 7:     3      1       1
 8:     3      2       2
 9:     3     -3       3
10:     4     -1       1
11:     4      2       2
12:     4      3       3

My attempt so far:

DT[,row_idx := seq_len(.N), by = "group"] #append row index per group
DT[,.SD[row_idx > (which(sign(value) == -1))], by = "group"] 

   group  value row_idx
1:     2      3       3
2:     4      2       2
3:     4      3       3

In this example group 1 is being deleted although I would like to keep it as no negative number is present in this group. I can check for the presence/absence of negative signs in value by DT[,(-1) %in% sign(value), by = "group"] but I do not know how to use this to achieve what I want.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We may use a if/else condition

library(data.table)
DT[DT[, if(any(sign(value) < 0)) 
    .I[row_idx > (which(sign(value) == -1))] else .I, by = group]$V1]

-output

   group value row_idx
   <int> <num>   <int>
1:     1     1       1
2:     1     2       2
3:     1     3       3
4:     2     3       3
5:     4     2       2
6:     4     3       3

Or slightly more compact option

DT[DT[, .I[seq_len(.N) > match(-1, sign(value), nomatch = 0)], group]$V1]
   group value
   <int> <num>
1:     1     1
2:     1     2
3:     1     3
4:     2     3
5:     4     2
6:     4     3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading