I am trying to flag the closest number of days from an accident to when someone got insurance. 0 represents the same day, a negative number represents insurance prior to accident and a positive number represents insurance post accident.
Data
df <- data.frame(id=1:4, accident1=c(-1,3,4, NA), accident2=c(100, -2, NA, NA), accident3=c(-3,1,2, NA))
> df
id accident1 accident2 accident3
1 1 -1 100 -3
2 2 3 -2 1
3 3 4 NA 2
4 4 NA NA NA
Code:
library(DescTools)
library(dplyr)
df %>%
rowwise() %>%
mutate(magic=
case_when(
accident1 <0 |accident2<0 |accident3<0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
accident1 >0 |accident2>0 | accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
accident1 > 0 & accident2 > 0 & accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
accident1 < 0 & accident2 < 0 & accident3 < 0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
TRUE ~ NA_real_)) %>%
rowwise() %>%
# not working
mutate(magic= case_when(
(accident1 >0 |accident2<0 |accident3<0) & (accident1 >0 |accident2>0 | accident3>0) ~
Closest(as.numeric(unlist(c(accident1, accident2, accident3))), 0, na.rm=T), TRUE~magic))
Result without last row in data (all NAs):
# A tibble: 3 × 5
# Rowwise:
id accident1 accident2 accident3 magic
<int> <dbl> <dbl> <dbl> <dbl>
1 1 -1 100 -3 -1
2 2 3 -2 1 1
3 3 4 NA 2 2
However, when I try this with NA in last row:
Error: Problem with `mutate()` column `magic`.
ℹ `magic = case_when(...)`.
ℹ `magic` must be size 1, not 0.
ℹ Did you mean: `magic = list(case_when(...))` ?
ℹ The error occurred in row 4.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Problem with `mutate()` column `magic`.
ℹ `magic = case_when(...)`.
ℹ no non-missing arguments to min; returning Inf
ℹ The warning occurred in row 4.
Any advice on how to get the code to work on row 4 with NAs?
>Solution :
The main reason is the last element which returns numeric(0)
because all the elements are NA
and we used na.rm = TRUE
for Closest
. We can prevent that by indexing to return first element and this changes to NA
library(dplyr)
df %>%
mutate(magic=
case_when(
accident1 <0 |accident2<0 |accident3<0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
accident1 >0 |accident2>0 | accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
accident1 > 0 & accident2 > 0 & accident3>0 ~ as.numeric(pmin(accident1, accident2, accident3, na.rm=T)),
accident1 < 0 & accident2 < 0 & accident3 < 0 ~ as.numeric(pmax(accident1, accident2, accident3, na.rm=T)),
TRUE ~ NA_real_)) %>%
rowwise() %>%
mutate(magic= case_when(
(accident1 >0 |accident2<0 |accident3<0) & (accident1 >0 |accident2>0 | accident3>0) ~
Closest(as.numeric(unlist(c(accident1, accident2, accident3))), 0, na.rm=TRUE)[1], TRUE~magic))
-output
# A tibble: 4 × 5
# Rowwise:
id accident1 accident2 accident3 magic
<int> <dbl> <dbl> <dbl> <dbl>
1 1 -1 100 -3 -1
2 2 3 -2 1 1
3 3 4 NA 2 2
4 4 NA NA NA NA
It is more easier to understand if we do this only on Closest
> apply(df[-1], 1, function(x) Closest(x, 0, na.rm = TRUE))
[[1]]
accident1
-1
[[2]]
accident3
1
[[3]]
accident3
2
[[4]]
named numeric(0) ####
The solution does index, so that numeric(0)
becomes NA
numeric(0)[1]
[1] NA