Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Only keeping the first observation of the month in dataset

I have the following kind of dataframe, with thousands of columns and rows. First column contains dates, and the following columns contain asset returns indexes corresponding to that date.

DATE Asset_1 Asset_2 Asset_3 Asset_4
2000-01-01 1000 300 2900 NA
…..
2000-01-31 1100 350 2950 NA
2000-02-02 1200 330 2970 100
2000-02-28 1200 360 3000 200
2000-03-01 1200 370 3500 300

I want to make this into a monthly dataset by only keeping the first observation of the month.

I have come up with the following script:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(dplyr)
library(lubridate)
monthly <- daily %>% filter(day(DATE) == 1) 

However, the problem with this is that it doesnt work for months where the first day of the month is not a trading date (aka it is missing from the daily dataset).

So when I run the command, those months where the first day of the month doesn’t exist are excluded from my dataset.

>Solution :

If the data is always ordered, you could group by year\month, then keep (slice) the first record from each group. Like:

df<-data.frame(mydate=as.Date("2023-01-01")+1:45)

library(tidyverse)
library(lubridate)

df %>% 
  group_by(ym=paste(year(mydate), month(mydate))) %>% 
  #group_by(year(mydate), month(mydate)) %>% 
  slice_head(n=1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading