I seek guidance from the wise, I’m new to programming.
I’m trying to create a new column in my dataframe with data from another column, by using the functions grouping(), arrange() and lag().
My simplified dataframe looks like this:
timestamp | id | price |
---|---|---|
2023-01-01 | 1 | 100 |
2023-01-02 | 1 | 102 |
2023-01-03 | 1 | 101 |
2023-01-04 | 1 | 103 |
2023-01-01 | 2 | 10 |
2023-01-02 | 2 | 12 |
2023-01-03 | 2 | 11 |
2023-01-04 | 2 | 13 |
and I want to compute the percent price change compared to 2 days ago for each ID, it would look like this:
timestamp | id | price | price_%change_2d |
---|---|---|---|
2023-01-01 | 1 | 100 | N/A |
2023-01-02 | 1 | 102 | N/A |
2023-01-03 | 1 | 101 | 0.01 |
2023-01-04 | 1 | 103 | 0.0098 |
2023-01-01 | 2 | 10 | N/A |
2023-01-02 | 2 | 12 | N/A |
2023-01-03 | 2 | 11 | 0.1 |
2023-01-04 | 2 | 13 | 0.098 |
Mind you the IDs are not grouped and the timestamps are not ordered properly in reality so I want to order them first.
So far I have come up with this code:
df <- df %>%
group_by(id) %>%
arrange(timestamp) %>%
df$"price_%change_2d" <- (df$price - lag(df$price, n = 2L)) / (lag(df$price, n = 2L))
and it gives me the error:
Error in df %>% group_by(id) %>% arrange(timestamp) %>% df$price_%change_2d <- (df$price - :
could not find function "%>%<-"
When I try each part of the code independently they work fine, i.e.:
df <- df %>%
group_by(id) %>%
arrange(timestamp)
and
df$"price_%change_2d" <- (df$price - lag(df$price, n = 2L)) / (lag(df$price, n = 2L))
but I need to order the dataframe before computing the price change.
Can you help me please?
>Solution :
You’ll want to use mutate
to use group_by
like you want.
I.e.
library(dplyr)
df %>%
group_by(id) %>%
arrange(timestamp) %>%
mutate(price_perc_change_2d = (price - lag(price, n = 2L)) / (lag(price, n = 2L))) %>%
ungroup()
Or from dplyr 1.1.0
(possibly) shortened to:
df %>%
mutate(price_perc_change_2d = (price - lag(price, n = 2L)) / (lag(price, n = 2L)), .by = "id")
Output:
timestamp id price price_perc_change_2d
<date> <dbl> <dbl> <dbl>
1 2023-01-01 1 100 NA
2 2023-01-02 1 102 NA
3 2023-01-03 1 101 0.01
4 2023-01-04 1 103 0.00980
5 2023-01-01 2 10 NA
6 2023-01-02 2 12 NA
7 2023-01-03 2 11 0.1
8 2023-01-04 2 13 0.0833