Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to run calculation with long format data in R of even periods?

I have the following data frame:

  tibble(
    company = rep(x = "google", each = 15),
    date = c(
      "2019q1",
      "2019q2",
      "2019q3",
      "2019q4",
      "2019end",
      "2020q1",
      "2020q2",
      "2020q3",
      "2020q4",
      "2020end",
      "2021q1",
      "2021q2",
      "2021q3",
      "2021q4",
      "2021end"
    ),
    values = c(1:15)
  )

How can i apply the following calculation to this data frame:

This will yield a new column called new_value

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

e.g. for the first row of 2019q1 this would be:

new_value = 2019q1 + 2018end + 2018q1
More detailed result post 2019 years to follow

I am aware 2018 values do not exist so this will be simply NA

but when you get to 2020q1 the calculation would be this:

2020q1 + 2019end + 2019q1

2020q2 + 2019end + 2019q2

2020q3 + 2019end + 2019q3

2020q4 + 2019end + 2019q4

2020end + 2019end + 2020end

And so on and so forth.

In essence the formula per row is the following:

current date + end date value of previous year + previous date value

I have had some issues with trying to do this in long format, pivotting the data doesn’t yield the desired result either

>Solution :

Here is a solution using left_joins of mutated versions of the dataframe to align the correct values in each row, prior to a final rowwise addition and cleanup.

library(tidyverse)

tibble(
    company = rep(x = "google", each = 15),
    date = c(
      "2019q1",
      "2019q2",
      "2019q3",
      "2019q4",
      "2019end",
      "2020q1",
      "2020q2",
      "2020q3",
      "2020q4",
      "2020end",
      "2021q1",
      "2021q2",
      "2021q3",
      "2021q4",
      "2021end"
    ),
    values = c(1:15)
  ) %>%
  mutate(date = str_replace(date, "end", "q5")) %>%
  separate(date, into = c("year", "quarter"), sep = "q") %>%
  mutate(across(c("year", "quarter", "values"), as.integer)) %>%
  left_join(., group_by(., year) %>%
    slice_tail(n = 1) %>%
    mutate(year = year + 1), by = "year") %>%
  left_join(., mutate(., year = year + 1), by = c("year", "quarter.x")) %>%
  mutate(new_values = values.x.x + values.y.x + values.x.y) %>%
  select(company.x.x, year, quarter.x, values.x.x, new_values) %>%
  rename_with(~str_remove(., "\\..+"))

# # A tibble: 15 × 5
#    company  year quarter values new_values
#    <chr>   <dbl>   <int>  <int>      <int>
#  1 google   2019       1      1         NA
#  2 google   2019       2      2         NA
#  3 google   2019       3      3         NA
#  4 google   2019       4      4         NA
#  5 google   2019       5      5         NA
#  6 google   2020       1      6         12
#  7 google   2020       2      7         14
#  8 google   2020       3      8         16
#  9 google   2020       4      9         18
# 10 google   2020       5     10         20
# 11 google   2021       1     11         27
# 12 google   2021       2     12         29
# 13 google   2021       3     13         31
# 14 google   2021       4     14         33
# 15 google   2021       5     15         35
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading