Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Automating the creation of new columns for a list of dataframes – R

I have been tasked with the creation of lagging indicators for a series of data frames. Each dataframe has two columns: date and value. For each "value" column I need to create 12 new columns, where the lag increases by one unit.

A given data frame looks like this:

    date   | value |
01-01-2021     1
02-01-2021     2
03-01-2021     3
04-01-2021     4
05-01-2021     5
06-01-2021     6
07-01-2021     7
08-01-2021     8

And I need to add new columns so they end up looking like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    date   | value | lag1 | ... | lag12
01-01-2021     1       2            13
02-01-2021     2       3            14
03-01-2021     3       4            15
04-01-2021     4       5            16
05-01-2021     5       6            17
06-01-2021     6       7            18
07-01-2021     7       8            19
08-01-2021     8       9            20

The values shown are for illustrative purposes, the real data is of economic nature and therefore follows no particular pattern.

So far, I have only managed to figure out how to create one column at a time for a single data frame, but this isn’t very efficient. I need a way to loop through all of the data frames and create the additional columns.

Thanks in advance for any help!

>Solution :

Loop over however many lags you need, creating a list of the new column values, then add them back all at once.

lagfun <- function(x,n) c(tail(x,-n), rep(NA,n))
dat[paste0("lag", 1:7)] <- lapply(1:7, lagfun, x=dat$value)
dat

#        date value lag1 lag2 lag3 lag4 lag5 lag6 lag7
#1 01-01-2021     1    2    3    4    5    6    7    8
#2 02-01-2021     2    3    4    5    6    7    8   NA
#3 03-01-2021     3    4    5    6    7    8   NA   NA
#4 04-01-2021     4    5    6    7    8   NA   NA   NA
#5 05-01-2021     5    6    7    8   NA   NA   NA   NA
#6 06-01-2021     6    7    8   NA   NA   NA   NA   NA
#7 07-01-2021     7    8   NA   NA   NA   NA   NA   NA
#8 08-01-2021     8   NA   NA   NA   NA   NA   NA   NA

You could of course use the lag function from dplyr or a similar function instead of my lagfun, but this will be self-contained.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading