Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merging datasets based on date column

Hello Stack Community,

I am trying to merge two datasets based on the date column in each.
Below are the snapshots of each dataset, in which, you can clearly see that date column only is formatted as year-month-day, with no specification for hour:minute:second.

dataset 1; snapshot of date column

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

dataset 2; snapshot of date column

I am merging the two datasets using the following code:

final <- merge(dataset1, dataset2, by.x="close_date", by.y = "date",all.x = TRUE, all.y = TRUE)

Below is another snapshot which shows the result of the code above. As you can see, somehow the hour:minute:second appears in date column which dirsputs the merge.

result of the merge

Is there a way to merge the two columns properly? Maybe someone could suggest the function that would set hh:mm:ss to 00:00:00? Btw, beforehand in the script I have both columns "close_date" from dataset1 and "date" from dataset2 formatted in the same way using the as.POSIXct function.

You can check the R output for head() below:

 head(dataset1) %>%
   select(close_date)
 
 A tibble: 6 x 1
  close_date         
  <dttm>             
1 2020-03-01 00:00:00
2 2020-03-02 00:00:00
3 2020-03-03 00:00:00
4 2020-03-04 00:00:00
5 2020-03-05 00:00:00
6 2020-03-06 00:00:00


 head(dataset2) %>%
   select(date)

 A tibble: 6 x 1
  date               
  <dttm>             
 1 2020-03-01 00:00:00
 2 2020-03-02 00:00:00
 3 2020-03-03 00:00:00
 4 2020-03-04 00:00:00
 5 2020-03-05 00:00:00
 6 2020-03-06 00:00:00

>Solution :

With the provided data (in future please provide data directly with dput(head(dataset1)). See here How to make a great R reproducible example

We could define the date format in each dataset before merging using ymd function of lubridate package:

library(lubridate)

dataset1$close_date <- ymd(dataset1$close_date)
dataset2$date <- ymd(dataset2$date)

final <- merge(dataset1, dataset2, by.x="close_date", by.y = "date",all.x = TRUE, all.y = TRUE)
final
  close_date    coin case_count_world_
1 2020-03-01 BTCUSDT                  
2 2020-03-02 BTCUSDT                  
3 2020-03-03 BTCUSDT                  
4 2020-03-04 BTCUSDT                  
5 2020-03-05 BTCUSDT                  
6 2020-03-06    <NA>                  
7 2020-03-07    <NA>                  
8 2020-03-08    <NA>  

data:

dataset1 <- structure(list(close_date = structure(c(18322, 18323, 18324, 
18325, 18326), class = "Date"), coin = c("BTCUSDT", "BTCUSDT", 
"BTCUSDT", "BTCUSDT", "BTCUSDT")), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

dataset2 <- structure(list(date = structure(c(18322, 18323, 18324, 18325, 
18326, 18327, 18328, 18329), class = "Date"), case_count_world_ = c("", 
"", "", "", "", "", "", "")), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading