Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dates subtraction from different rows of data frame

I have big data frame (dim: 12867779×5) which looks like that:

id group date1 date 2 icf
id1 2 2020-03-17 2019-06-05
id1 3 2020-04-03 2019-05-09
id2 2 2020-04-10 2019-07-04
id2 3 2021-04-1 2020-06-01
id3 1 2020-04-13 2019-07-07
id3 2 2021-04-10 2020-06-01
id3 3 2020-04-10 2019-07-04
id3 4 2021-04-13 2020-06-01

Desired output:

id group date1 date 2 icf
id1 3 2020-04-03 2019-05-09 0
id2 2 2020-04-10 2019-07-04 52
id2 3 2021-04-01 2020-06-01 0
id3 1 2020-04-13 2019-07-07 49
id3 2 2021-04-10 2020-06-01 -646
id3 3 2020-04-10 2019-07-04 52
id3 4 2021-04-13 2020-06-01 0

To calculate icf I need to check if the id’s from row i and i+1 are the same. If yes icf = date2(i+1) – date1(i).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I wrote this function to calculate icf, but it’s too slow. I’m looking for ways to speed it up, I was thinking about using the apply function but I don’t have idea how to re-write this icfCalculation fucntion.

icfCalculation <- function(dataFrame){
  nr <- nrow(dataFrame) - 1
  for (i in 1:nr) {
    if(dataFrame[i, 1] == dataFrame[i+1, 1]){
      dataFrame[i,5] = dataFrame[i+1, 4] - dataFrame[i, 3]
    } 
    else{
      dataFrame[i,5] = 0
    }
  }
  return(dataFrame)
}

>Solution :

Thanks for putting expected output. This is not the same as what you have put – but it does give the same results as your function and should be significantly quicker to thanks to the data.table internal optimisations:

library(data.table)

# Read in data 
dat  <- read.table(text = "id   group   date1   date2   
id1 2   2020-03-17  2019-06-05  
id1 3   2020-04-03  2019-05-09  
id2 2   2020-04-10  2019-07-04  
id2 3   2021-04-1   2020-06-01  
id3 1   2020-04-13  2019-07-07  
id3 2   2021-04-10  2020-06-01  
id3 3   2020-04-10  2019-07-04  
id3 4   2021-04-13  2020-06-01", 
    h = T, 
    colClasses = c("character", "character", "Date", "Date")
)

# Make it a data.table
setDT(dat)

dat[, icf := fifelse(
    id == shift(id, type = "lead"), 
    as.integer(
        shift(date2, type = "lead") - date1
        ), 
    0)
]
dat
#     id group      date1      date2  icf
# 1: id1     2 2020-03-17 2019-06-05 -313
# 2: id1     3 2020-04-03 2019-05-09    0
# 3: id2     2 2020-04-10 2019-07-04   52
# 4: id2     3 2021-04-01 2020-06-01    0
# 5: id3     1 2020-04-13 2019-07-07   49
# 6: id3     2 2021-04-10 2020-06-01 -646
# 7: id3     3 2020-04-10 2019-07-04   52
# 8: id3     4 2021-04-13 2020-06-01   NA

If you want the last NA to be 0, just add dat$icf[nrow(dat)] <- 0.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading