Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R, trying to create a column from another one using lag(), grouping() and arrange()

I seek guidance from the wise, I’m new to programming.

I’m trying to create a new column in my dataframe with data from another column, by using the functions grouping(), arrange() and lag().
My simplified dataframe looks like this:

timestamp id price
2023-01-01 1 100
2023-01-02 1 102
2023-01-03 1 101
2023-01-04 1 103
2023-01-01 2 10
2023-01-02 2 12
2023-01-03 2 11
2023-01-04 2 13

and I want to compute the percent price change compared to 2 days ago for each ID, it would look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

timestamp id price price_%change_2d
2023-01-01 1 100 N/A
2023-01-02 1 102 N/A
2023-01-03 1 101 0.01
2023-01-04 1 103 0.0098
2023-01-01 2 10 N/A
2023-01-02 2 12 N/A
2023-01-03 2 11 0.1
2023-01-04 2 13 0.098

Mind you the IDs are not grouped and the timestamps are not ordered properly in reality so I want to order them first.

So far I have come up with this code:

df <- df %>%
  group_by(id) %>%
  arrange(timestamp) %>%
  df$"price_%change_2d" <- (df$price - lag(df$price, n = 2L)) / (lag(df$price, n = 2L))

and it gives me the error:

Error in df %>% group_by(id) %>% arrange(timestamp) %>% df$price_%change_2d <- (df$price -  : 
  could not find function "%>%<-"

When I try each part of the code independently they work fine, i.e.:

df <- df %>%
  group_by(id) %>%
  arrange(timestamp)

and

df$"price_%change_2d" <- (df$price - lag(df$price, n = 2L)) / (lag(df$price, n = 2L))

but I need to order the dataframe before computing the price change.

Can you help me please?

>Solution :

You’ll want to use mutate to use group_by like you want.

I.e.

library(dplyr)

df %>%
  group_by(id) %>%
  arrange(timestamp) %>%
  mutate(price_perc_change_2d = (price - lag(price, n = 2L)) / (lag(price, n = 2L))) %>%
  ungroup()

Or from dplyr 1.1.0 (possibly) shortened to:

df %>%
  mutate(price_perc_change_2d = (price - lag(price, n = 2L)) / (lag(price, n = 2L)), .by = "id")

Output:

  timestamp     id price price_perc_change_2d
  <date>     <dbl> <dbl>                <dbl>
1 2023-01-01     1   100             NA      
2 2023-01-02     1   102             NA      
3 2023-01-03     1   101              0.01   
4 2023-01-04     1   103              0.00980
5 2023-01-01     2    10             NA      
6 2023-01-02     2    12             NA      
7 2023-01-03     2    11              0.1    
8 2023-01-04     2    13              0.0833 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading