Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find the previous-trial score within a nested dataframe in a loop and save as a new variable

I have a dataframe that currently looks like this:

subjectID Trial Score
1 1 16
1 1 16
1 1 16
1 2 8
1 2 8
1 2 8
1 3 12
1 3 12
1 3 12
2 1 9
2 1 9
2 1 9
2 2 10
2 2 10
2 2 10

I need to create a new column, Previous_Trial_Score, that is simply the score on the last trial for each person. For example:

subjectID Trial Score Previous_Trial_Score
1 1 16 NA
1 1 16 NA
1 1 16 NA
1 2 8 16
1 2 8 16
1 2 8 16
1 3 12 8
1 3 12 8
1 3 12 8
2 1 9 NA
2 1 9 NA
2 1 9 NA
2 2 10 9
2 2 10 9
2 2 10 9

And so on. Trial 1 for each subject will always be NA, as there is no previous trial for that person. I am writing a for-loop to accomplish this, below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for (myperson in unique(data$subjectID)){
  for (mytrial in unique(data$Trial[data$Trial>1])){

    #Specify the trial and person
    Prev_Score=as.numeric(unique(data[data$subjectID==myperson & data$Trial==mytrial-1, "Score"]))
    

    #Save it to the dataframe
    data[data$subjectID==myperson & data$Trial==mytrial,"Prev_Score"]=Prev_Score
    
    
  }
}

In the above loop, I had to specify as.numeric and unique to get R to return a single value properly. However, when I run the loop, I get this error:

Error: Assigned data `value` must be compatible with existing data.
i Error occurred for column `Prev_Score`.
x Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 3...

Does anyone have a solution? I am open to tidyverse/dplyr work-arounds.

>Solution :

We could do a group by summarise to get the lag and then do a join

library(dplyr)
df1 %>% 
   group_by(subjectID, Trial) %>%
   summarise(Previous_Trial_Score = last(Score), .groups= 'drop_last') %>% 
   mutate(Previous_Trial_Score = lag(Previous_Trial_Score)) %>%   
   left_join(df1, .)

-output

   subjectID Trial Score Previous_Trial_Score
1          1     1    16                   NA
2          1     1    16                   NA
3          1     1    16                   NA
4          1     2     8                   16
5          1     2     8                   16
6          1     2     8                   16
7          1     3    12                    8
8          1     3    12                    8
9          1     3    12                    8
10         2     1     9                   NA
11         2     1     9                   NA
12         2     1     9                   NA
13         2     2    10                    9
14         2     2    10                    9
15         2     2    10                    9

data

df1 <- structure(list(subjectID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L), Trial = c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L), Score = c(16L, 16L, 16L, 
8L, 8L, 8L, 12L, 12L, 12L, 9L, 9L, 9L, 10L, 10L, 10L)), 
class = "data.frame", row.names = c(NA, 
-15L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading