filtering duplicated rows with maximum values in a column with NAs

January 21, 2022

I have a dataframe that kinda looks like this:

studentID	Enrollment_date (class=Date)
56783536	2019-02-01
56783536	2019-02-12
56783406	NA
56783406	NA
56783522	2019-03-02
56783522	NA
56783599	NA
56783599	NA

I’m trying to filter the rows in order to 1) keep the row with the maximum date, 2) keep both duplicated rows when both enrollment dates are NA and 3) when one date is NA and the other isn’t, keep the one row without NA. So it should output this:

studentID	Enrollment_date (class=Date)
56783536	2019-02-12
56783406	NA
56783406	NA
56783522	2019-03-02
56783599	NA
56783599	NA

I tried this:

dup2_ENROLL_2020 <- dup_ENROLL_2020 %>%
  group_by(studentID) %>%
  filter(Enrollment_date==max(Enrollment_date))

And this:

dup2_ENROLL_2020 <- dup_ENROLL_2020 %>%
  group_by(studentID) %>%
  slice(which.max(Enrollment_date))

With a bunch of variations of !is.na() and na.omit() in them. None of them worked, somebody help me (sorry for the rookie problem)

>Solution :

Try this:

dat %>%
  group_by(studentID) %>%
  filter(if (all(is.na(Enrollment_date))) TRUE else Enrollment_date == max(Enrollment_date, na.rm = TRUE)) %>%
  ungroup()
# # A tibble: 6 x 2
#   studentID Enrollment_date
#       <int> <date>         
# 1  56783536 2019-02-12     
# 2  56783406 NA             
# 3  56783406 NA             
# 4  56783522 2019-03-02     
# 5  56783599 NA             
# 6  56783599 NA