I have a dataframe that kinda looks like this:
| studentID | Enrollment_date (class=Date) |
|---|---|
| 56783536 | 2019-02-01 |
| 56783536 | 2019-02-12 |
| 56783406 | NA |
| 56783406 | NA |
| 56783522 | 2019-03-02 |
| 56783522 | NA |
| 56783599 | NA |
| 56783599 | NA |
I’m trying to filter the rows in order to 1) keep the row with the maximum date, 2) keep both duplicated rows when both enrollment dates are NA and 3) when one date is NA and the other isn’t, keep the one row without NA. So it should output this:
| studentID | Enrollment_date (class=Date) |
|---|---|
| 56783536 | 2019-02-12 |
| 56783406 | NA |
| 56783406 | NA |
| 56783522 | 2019-03-02 |
| 56783599 | NA |
| 56783599 | NA |
I tried this:
dup2_ENROLL_2020 <- dup_ENROLL_2020 %>%
group_by(studentID) %>%
filter(Enrollment_date==max(Enrollment_date))
And this:
dup2_ENROLL_2020 <- dup_ENROLL_2020 %>%
group_by(studentID) %>%
slice(which.max(Enrollment_date))
With a bunch of variations of !is.na() and na.omit() in them. None of them worked, somebody help me (sorry for the rookie problem)
>Solution :
Try this:
dat %>%
group_by(studentID) %>%
filter(if (all(is.na(Enrollment_date))) TRUE else Enrollment_date == max(Enrollment_date, na.rm = TRUE)) %>%
ungroup()
# # A tibble: 6 x 2
# studentID Enrollment_date
# <int> <date>
# 1 56783536 2019-02-12
# 2 56783406 NA
# 3 56783406 NA
# 4 56783522 2019-03-02
# 5 56783599 NA
# 6 56783599 NA