Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Creating lag variable using for loop

What I want to perform:
If hmonth=2 and hyear=2000, subtract each observation of wageratio.female from that of hmonth=1 and hyear=2000.
If hmonth=2 and hyear=2001, subtract each observation of wageratio.female from that of hmonth=1 and hyear=2001.
Repeat for all hmonth and hyear.
Create a variable called wageratio.lags for the differences.

Below is a small section of my attempt at for loop. Should I be using for loop to achieve my desired output?

differences = list()

for i in range(len(hmonth)):
    # Check if the current pair is (2, 2000) or (2, 2001)
    if hmonth[i] == 2:
        if hyear[i] == 2000:
            # Subtract each observation of wageratio_female from that of hmonth=1 and hyear=2000
            difference = wageratio_female[i] - wageratio_female[hmonth.index(1)]
            differences.append(difference)
        elif hyear[i] == 2001:
            # Subtract each observation of wageratio_female from that of hmonth=1 and hyear=2001
            difference = wageratio_female[i] - wageratio_female[hmonth.index(1)]
            differences.append(difference)
Error: unexpected symbol in "for i"

Desired output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

hmonth hyear wageratio.female wageratio.lags
1 2000 -0.43 -0.01
1 2001 0.18 -0.62
2 2000 -0.44 0.12
2 2001 -0.44 -0.47
3 2000 -0.32 -0.45
3 2001 -0.91 0.70
4 2000 -0.77 1.24
4 2001 -0.21 NA
5 2000 0.47 NA
df <- data.frame(
  wageratio_female = c(-0.43, 0.18, -0.44, -0.44, -0.32, -0.91, -0.77, -0.21, 0.47),
  hmonth = c(1, 1, 2, 2, 3, 3, 4, 4, 5),
  hyear = c(2000, 2001, 2000, 2001, 2000, 2001, 2000, 2001, 2000)
 )

>Solution :

you can use the dplyr lead/lag functions to do this without a loop. For example

library(dplyr)
df %>% 
  group_by(hyear) %>% 
  arrange(hmonth) %>% 
  mutate(wageratio.lags = lead(wageratio_female) - wageratio_female) %>%
  ungroup()

produces

  wageratio_female     hmonth      hyear    wageratio.lags
             <dbl> <hvn_lbll> <hvn_lbll>   <dbl>
1            -0.43          1       2000 -0.0100
2             0.18          1       2001 -0.62  
3            -0.44          2       2000  0.12  
4            -0.44          2       2001 -0.47  
5            -0.32          3       2000 -0.45  
6            -0.91          3       2001  0.7   
7            -0.77          4       2000  1.24  
8            -0.21          4       2001 NA     
9             0.47          5       2000 NA 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading