Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R Find Distance Between Two values By Group

HAVE = data.frame(INSTRUCTOR = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3),
STUDENT = c(1, 2, 2, 2, 1, 3, 1, 1, 1, 1, 2, 1),
SCORE = c(10, 1, 0, 0, 7, 3, 5, 2, 2, 4, 10, 2),
TIME = c(1,1,2,3,2,1,1,2,3,1,1,2))
 
WANT = data.frame(INSTRUCTOR = c(1, 2, 3), 
SCORE.DIF = c(-9, NA, 6))

For each INSTRUCTOR, I wish to find the SCORE of the first and second STUDENT, and subtract their scores. The STUDENT code varies so I wish not to use ‘==1’ vs ‘==2’

I try:

HAVE[, .SD[1:2], by = 'INSTRUCTOR']

but do not know how to subtract vertically and obtain ‘WANT’ data frame from ‘HAVE’

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

library(data.table)
setDT(HAVE)
unique(HAVE, by = c("INSTRUCTOR", "STUDENT")
  )[, .(SCORE.DIF = diff(SCORE[1:2])), by = INSTRUCTOR]
#    INSTRUCTOR SCORE.DIF
#         <num>     <num>
# 1:          1        -9
# 2:          2        NA
# 3:          3         6

To use your new TIME variable, we can do

HAVE[, .SD[which.min(TIME),], by = .(INSTRUCTOR, STUDENT)
  ][, .(SCORE.DIF = diff(SCORE[1:2])), by = INSTRUCTOR]
#    INSTRUCTOR SCORE.DIF
#         <num>     <num>
# 1:          1        -9
# 2:          2        NA
# 3:          3         6

One might be tempted to replace SCORE[1:2] with head(SCORE,2), but that won’t work: head(SCORE,2) will return length-1 if the input is length-2, as it is with instructor 2 (who only has one student albeit multiple times). When you run diff on length-1 (e.g., diff(1)), it returns a 0-length vector, which in the above data.table code reduces to zero rows for instructor 2. However, when there is only one student, SCORE[1:2] resolves to c(SCORE[1], NA), for which the diff is length-1 (as needed) and NA (as needed).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading