I have a csv datasheet with 7 columns filled with numeric values.
3 of these columns represent the date of the measurements: "YYYY", "MM", "DD", followed by 4 columns of relevant corresponding data: "qobs", "ckhs", "qceq", "qcol".
How do I convert the three first columns filled with numeric values into a date-datatype, while maintaining the dependency of the dates to the corresponding date?
# YYYY, MM, DD, qobs, ckhs, qceq, qcol
# 1 1981, 1, 1, 7.136, 0, 0, 0
# 2 1981, 1, 2, 6.76, 0, 0, 0
# 3 1981, 1, 3, 10.886, 0, 0, 0
# ...
I looked online and only found solutions using the as.Date function that correspond to a single character string. I’m fairly new to programming and have only used R for a couple of days, so an elementary explanation would be greatly appreciated.
>Solution :
A tiydverse solution:
library(vroom)
library(dplyr)
library(lubridate) # a truly wonderful package for this kind of thing
df <- vroom("path-to-your-file.csv"
col_types = "iiidddd")
df <-
mutate(
df,
date = make_date(YYYY, MM, DD)
.keep = "unused", # drop the columns used for computation
.before = qobs
)
Explanation
vroom::vroom() is a really useful (and really fast!) function for reading plaintext data into R. It guesses the delimiter from the data and is generally pretty easy to implement.
dplyr::mutate() is a staple of tidyverse data manipulation. It computes new columns within dataframes, or modifies existing columns by overwriting them with new values. Here, we are computing a new column called date using lubridate::make_date(), which does what it says on the tin.
We also specify some of mutate()‘s named arguments:
.keep = "unused"lets us automatically drop all of the columns we used to calculate our new variable, because we no longer need theYYYY,MMorDDcolumns.before = qobsjust makes our newdatecolumn appear in front ofqobs, on the left-hand-side of our dataframe.
Edit: I was previously implementing the convoluted:
paste(YYYY, MM, DD, sep = ",") |>
lubridate::ymd()
Thanks to Adriano for showing me that make_date() exists!