Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R – Convert various csv numeric columns to date

I have a csv datasheet with 7 columns filled with numeric values.
3 of these columns represent the date of the measurements: "YYYY", "MM", "DD", followed by 4 columns of relevant corresponding data: "qobs", "ckhs", "qceq", "qcol".

How do I convert the three first columns filled with numeric values into a date-datatype, while maintaining the dependency of the dates to the corresponding date?

#   YYYY, MM, DD, qobs, ckhs, qceq, qcol
# 1 1981, 1, 1, 7.136, 0, 0, 0
# 2 1981, 1, 2, 6.76, 0, 0, 0
# 3 1981, 1, 3, 10.886, 0, 0, 0
# ...

I looked online and only found solutions using the as.Date function that correspond to a single character string. I’m fairly new to programming and have only used R for a couple of days, so an elementary explanation would be greatly appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

A tiydverse solution:

library(vroom)
library(dplyr)
library(lubridate) # a truly wonderful package for this kind of thing

df <- vroom("path-to-your-file.csv"
            col_types = "iiidddd")

df <-
  mutate(
    df, 

    date = make_date(YYYY, MM, DD)

   .keep = "unused", # drop the columns used for computation
   .before = qobs
   )

Explanation

vroom::vroom() is a really useful (and really fast!) function for reading plaintext data into R. It guesses the delimiter from the data and is generally pretty easy to implement.

dplyr::mutate() is a staple of tidyverse data manipulation. It computes new columns within dataframes, or modifies existing columns by overwriting them with new values. Here, we are computing a new column called date using lubridate::make_date(), which does what it says on the tin.

We also specify some of mutate()‘s named arguments:

  • .keep = "unused" lets us automatically drop all of the columns we used to calculate our new variable, because we no longer need the YYYY, MM or DD columns
  • .before = qobs just makes our new date column appear in front of qobs, on the left-hand-side of our dataframe.

Edit: I was previously implementing the convoluted:

paste(YYYY, MM, DD, sep = ",") |>
lubridate::ymd()

Thanks to Adriano for showing me that make_date() exists!

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading