Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculating weighted average based on criteria

I have a panel data set, where I have tens of years of data for thousands of assets.

There are also rows with NA values in RET/MV/RANK_LAG, which I would like to ignore.

I have got my data to this data frame, where I have months (in class(date)), asset codes, monthly returns and market capitalization. In addition. I have decile ranks that change every month.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here is an example dataset. Instead of deciles I have just two ranks for the sake of this example dataset:

DATE CODE RET MV RANK_LAG
2000-01-01 999999 0.02 10 1
2000-01-01 2222X2 -0.01 20 2
2000-01-01 123456 0.05 30 2
2000-01-01 5555Y5 0.00 5 1
2000-02-01 999999 0.05 10 2
2000-02-01 2222X2 0.10 20 2
2000-02-01 123456 -0.1 30 1
2000-02-01 5555Y5 -0.1 5 1

What I would like to do is to calculate a weighted average return (using MV as weights) for each RANK_LAG, for each DATE.

So a desired output would be:

DATE RANK WEIGHTED_RET
2000-01-01 1 0.013
2000-01-01 2 0.026
2000-02-01 1 -0.100
2000-02-01 2 0.083

>Solution :

Use base function weighted.mean in a dplyr pipe.

df1 <- read.table(text = "DATE  CODE    RET     MV  RANK_LAG
2000-01-01  999999  0.02    10  1
2000-01-01  2222X2  -0.01   20  2
2000-01-01  123456  0.05    30  2
2000-01-01  5555Y5  0.00    5   1
2000-02-01  999999  0.05    10  2
2000-02-01  2222X2  0.10    20  2
2000-02-01  123456  -0.1    30  1
2000-02-01  5555Y5  -0.1    5   1
", header = TRUE, colClasses = c("Date", "character", rep("numeric", 3)))

suppressPackageStartupMessages(
  library(dplyr)
)

df1 %>%
  group_by(DATE, RANK_LAG) %>%
  summarise(WEIGHTED_RET = weighted.mean(RET, MV, na.rm = TRUE), .groups = "drop")
#> # A tibble: 4 × 3
#>   DATE       RANK_LAG WEIGHTED_RET
#>   <date>        <dbl>        <dbl>
#> 1 2000-01-01        1       0.0133
#> 2 2000-01-01        2       0.026 
#> 3 2000-02-01        1      -0.1   
#> 4 2000-02-01        2       0.0833

Created on 2023-02-07 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading