R, trying to create a column from another one using lag(), grouping() and arrange()

Advertisements

I seek guidance from the wise, I’m new to programming.

I’m trying to create a new column in my dataframe with data from another column, by using the functions grouping(), arrange() and lag().
My simplified dataframe looks like this:

timestamp id price
2023-01-01 1 100
2023-01-02 1 102
2023-01-03 1 101
2023-01-04 1 103
2023-01-01 2 10
2023-01-02 2 12
2023-01-03 2 11
2023-01-04 2 13

and I want to compute the percent price change compared to 2 days ago for each ID, it would look like this:

timestamp id price price_%change_2d
2023-01-01 1 100 N/A
2023-01-02 1 102 N/A
2023-01-03 1 101 0.01
2023-01-04 1 103 0.0098
2023-01-01 2 10 N/A
2023-01-02 2 12 N/A
2023-01-03 2 11 0.1
2023-01-04 2 13 0.098

Mind you the IDs are not grouped and the timestamps are not ordered properly in reality so I want to order them first.

So far I have come up with this code:

df <- df %>%
  group_by(id) %>%
  arrange(timestamp) %>%
  df$"price_%change_2d" <- (df$price - lag(df$price, n = 2L)) / (lag(df$price, n = 2L))

and it gives me the error:

Error in df %>% group_by(id) %>% arrange(timestamp) %>% df$price_%change_2d <- (df$price -  : 
  could not find function "%>%<-"

When I try each part of the code independently they work fine, i.e.:

df <- df %>%
  group_by(id) %>%
  arrange(timestamp)

and

df$"price_%change_2d" <- (df$price - lag(df$price, n = 2L)) / (lag(df$price, n = 2L))

but I need to order the dataframe before computing the price change.

Can you help me please?

>Solution :

You’ll want to use mutate to use group_by like you want.

I.e.

library(dplyr)

df %>%
  group_by(id) %>%
  arrange(timestamp) %>%
  mutate(price_perc_change_2d = (price - lag(price, n = 2L)) / (lag(price, n = 2L))) %>%
  ungroup()

Or from dplyr 1.1.0 (possibly) shortened to:

df %>%
  mutate(price_perc_change_2d = (price - lag(price, n = 2L)) / (lag(price, n = 2L)), .by = "id")

Output:

  timestamp     id price price_perc_change_2d
  <date>     <dbl> <dbl>                <dbl>
1 2023-01-01     1   100             NA      
2 2023-01-02     1   102             NA      
3 2023-01-03     1   101              0.01   
4 2023-01-04     1   103              0.00980
5 2023-01-01     2    10             NA      
6 2023-01-02     2    12             NA      
7 2023-01-03     2    11              0.1    
8 2023-01-04     2    13              0.0833 

Leave a Reply Cancel reply