Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Loop to capture differences greater than threshold in R

I have a dataset formatted as following:

person_ID  exam_ID value_1  number_studies
A1         1A1     2        3
A1         2A1     3        3
A1         3A1     1        3
A2         1A2     2        5
A2         2A2     3        5
A2         3A2     3.5      5
A2         4A2     1.5      5
A2         5A2     1.0      5

The data is ordered by person_ID and then by exam_ID. I would like to remove any rows following and including the first row with a difference between value_1 of less then -1.

For example, for person_ID ‘A1’, I would keep exam_IDs ‘1A1’ and ‘2A1’, but remove ‘3A1’ as the difference between value_1 for ‘3A1-2A1’ is < -1. For person_ID ‘A2’, I would remove exam_IDs 4A2 and 5A2.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I thought to do this with nested while loops to create a list of exam_IDs and then subset my dataframe, but the code does not work. See example below. I would appreciate any advice/suggestions!

z1 <- list()
for(person in unique(df$person_ID)) {
tempdata <- subset(df, df$person_ID == person)
t1 <- seq(from = 1, to = (unique(tempdata$number_studies)-1))
i <- 0
t <- 1
while(t < (unique(tempdata$number_studies)-1)){
   while(i>-1){
     i <- tempdata[t + 1,3] - tempdata[t,3]
     tempID <- tempdata[t,]
     z1 <- append(z1, tempID$exam_ID)
     t <- t+1
   }
 }
}

>Solution :

You don’t need a loop for this. Here’s a solution using data.table

library(data.table)
setDT(dat)
dat[ , drop:=cumsum(c(0,diff(value_1))< -1), by=person_ID][drop==0, !"drop"]


   person_ID exam_ID value_1 number_studies
1:        A1     1A1     2.0              3
2:        A1     2A1     3.0              3
3:        A2     1A2     2.0              5
4:        A2     2A2     3.0              5
5:        A2     3A2     3.5              5

To understand how it works, a variable called drop is created which incrementally counts the number of values for which the difference between subsequent values is -1 or lower. This is stratified by person_ID. Then only the rows where drop is 0 are returned, and drop itself is dropped.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading