The sample dataset is given as below:
v = data.frame(group = c(1,1,2,3,3),date = as.Date(c('01-01-2000','01-01-2001','01-05-2000','02-07-2000','01-01-2008'), "%d-%m-%Y"))
v%>% group_by(group ) %>% mutate(difference_day = ifelse(n() == 2,
c(0,diff(date )),
difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days')))
My desirable result is :
group | difference_day |
---|---|
1 | 0 |
1 | 365 |
2 | 7915 |
3 | 0 |
3 | 2740 |
In the above code, if the length of groups is equal to one, then the days_difference will be
difftime(date ,as.Date('31-12-2021', "%d-%m-%Y"),units='days'))
.
However,
the output of the code was:
# A tibble: 5 × 3
# Groups: group [3]
group date difference_day
<dbl> <date> <dbl>
1 1 2000-01-01 0
2 1 2001-01-01 0
3 2 2000-05-01 -7914
4 3 2000-07-02 0
5 3 2008-01-01 0
which was very strange.
Please give me some suggestions, thank you!
>Solution :
Since you want to replace either the first vector or the second vector, use if
instead of if_else
. (That is, your conditional is external to the vectors, not an element-by-element conditional, where if_else
would be more appropriate.)
v %>%
group_by(group) %>%
mutate(d = if (n() == 2L) diff(c(date[1], date)) else difftime(as.Date("2021-12-31"), date, units = "days")) %>%
ungroup()
# # A tibble: 5 × 3
# group date d
# <dbl> <date> <drtn>
# 1 1 2000-01-01 0 days
# 2 1 2001-01-01 366 days
# 3 2 2000-05-01 7914 days
# 4 3 2000-07-02 0 days
# 5 3 2008-01-01 2739 days
There are some differences of +/- 1 from your expected output, not sure if that was a typo or some other intent outside of a traditional diff
.
The return from both diff
and difftime
here are class "difftime"
, which prints naturally with ". days"
; they are still number-enough that math or such still works on them. If you prefer not, just wrap with as.integer(.)
or as.numeric(.)
.