Say I have this df
df <- data.table(a = c('2022-01-20', '2022-01-21')
); df
a
1: 2022-01-20
2: 2022-01-21
Note that lubridate is able to convert this character column to date properly
fast_strptime(df$a, "%Y-%m-%d")
[1] "2022-01-20 UTC" "2022-01-21 UTC"
but when trying to store back to df data.table gives
df[, a := fast_strptime(a, "%Y-%m-%d") ]
Error in `[.data.table`(df, , `:=`(a, fast_strptime(a, "%Y-%m-%d"))) :
Supplied 9 items to be assigned to 2 items of column 'a'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
Looking forward to any ideas. Thank you.
>Solution :
strptime returns a list with POSIXlt class
str(fast_strptime(df$a, "%Y-%m-%d"))
POSIXlt[1:2], format: "2022-01-20" "2022-01-21"
> unclass(fast_strptime(df$a, "%Y-%m-%d"))
$sec
[1] 0 0
$min
[1] 0 0
$hour
[1] 0 0
$mday
[1] 20 21
$mon
[1] 0 0
$year
[1] 122 122
$wday
[1] NA NA
$yday
[1] NA NA
$isdst
[1] -1
attr(,"tzone")
[1] "UTC"
we may need to convert to POSIXct
df[, a := as.POSIXct(fast_strptime(a, "%Y-%m-%d")) ]
-output
> df
a
<POSc>
1: 2022-01-20
2: 2022-01-21
Instead of converting to POSIXlt and then to POSIXct, we could directly convert to POSIXct with a faster option ?parse_date_time2
parse_date_time2() is a fast C parser of numeric orders.
fast_strptime() is a fast C parser of numeric formats only that accepts explicit format arguments, just like base::strptime().
df[, a := parse_date_time2(a, "%Y-%m-%d") ]
-output
> df
a
<POSc>
1: 2022-01-20
2: 2022-01-21