Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get the latest time within the same date by group in R

I have a dataset

structure(list(country = c("US", "US", "US", "US", "US", "US",
                                        "UK", "UK"), ID = c(11, 11, 11, 11, 11, 11,
                                                                    12, 12), entry = c("2017-04-25 09:01:25", "2017-04-21 09:00:24", "2017-02-19 08:09:57", "2017-01-24 16:55:56", "2016-06-27 14:00:46",
                                                                                                    "2016-06-27 13:55:28", "2016-08-30 10:35:29", "2016-06-28 12:35:39"
                                                                    ), out = c("2017-04-21 09:41:00", "2017-04-21 09:41:00",
                                                                                           "2017-04-21 09:41:00", "2017-04-21 09:41:00", "2016-07-29 11:36:22",
                                                                                           "2016-07-29 11:36:12", NA, NA
                                                                    )), class = "data.frame", row.names = c(NA, -8L))

And the output is something like this

     country ID               entry                 out
1         US 11 2017-04-25 09:01:25 2017-04-21 09:41:00
2         US 11 2017-04-21 09:00:24 2017-04-21 09:41:00
3         US 11 2017-02-19 08:09:57 2017-04-21 09:41:00
4         US 11 2017-01-24 16:55:56 2017-04-21 09:41:00
5         US 11 2016-06-27 14:00:46 2016-07-29 11:36:22
6         US 11 2016-06-27 13:55:28 2016-07-29 11:36:12
7         UK 12 2016-08-30 10:35:29                <NA>
8         UK 12 2016-06-28 12:35:39                <NA>

What I would like to do is to remove duplicate dates for the same group of IDs and if there’s a same date but with a different h:m:s value, then take the latest date of it, so it would look something like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  country    ID entry               out                 unique_out     
   <chr>   <dbl> <chr>               <chr>               <chr>              
 1 US         11 2017-04-25 09:01:25 2017-04-21 09:41:00 2017-04-21 09:41:00
 2 US         11 2017-04-21 09:00:24 2017-04-21 09:41:00 NA                 
 3 US         11 2017-02-19 08:09:57 2017-04-21 09:41:00 NA                 
 4 US         11 2017-01-24 16:55:56 2017-04-21 09:41:00 NA                 
 5 US         11 2016-06-27 14:00:46 2016-07-29 11:36:22 2016-07-29 11:36:22
 6 US         11 2016-06-27 13:55:28 2016-07-29 11:36:12 NA
 7 UK         12 2016-08-30 10:35:29 NA                  NA                 
 8 UK         12 2016-06-28 12:35:39 NA                  NA 

I have used this data %>% group_by(ID) %>% mutate(unique_out=replace(out, duplicated(out), NA))
to filter out the duplicates but cannot figure out how to keep the latest time of the same day within a group of ID.
Many thanks if you could help me with this!

>Solution :

Try this:

dat %>%
  mutate(
    across(c(entry, out), as.POSIXct),
    out_date = as.Date(out)
  ) %>%
  group_by(ID, out_date) %>%
  mutate(unique_out = if_else(row_number() %in% which.max(out), out, out[NA])) %>%
  ungroup()
# # A tibble: 8 x 6
#   country    ID entry               out                 out_date   unique_out         
#   <chr>   <dbl> <dttm>              <dttm>              <date>     <dttm>             
# 1 US         11 2017-04-25 09:01:25 2017-04-21 09:41:00 2017-04-21 2017-04-21 09:41:00
# 2 US         11 2017-04-21 09:00:24 2017-04-21 09:41:00 2017-04-21 NA                 
# 3 US         11 2017-02-19 08:09:57 2017-04-21 09:41:00 2017-04-21 NA                 
# 4 US         11 2017-01-24 16:55:56 2017-04-21 09:41:00 2017-04-21 NA                 
# 5 US         11 2016-06-27 14:00:46 2016-07-29 11:36:22 2016-07-29 2016-07-29 11:36:22
# 6 US         11 2016-06-27 13:55:28 2016-07-29 11:36:12 2016-07-29 NA                 
# 7 UK         12 2016-08-30 10:35:29 NA                  NA         NA                 
# 8 UK         12 2016-06-28 12:35:39 NA                  NA         NA                 

Walk-through:

  • in order to use which.max later, we need number-like timestamps, so we first convert to POSIXt timestamps; this step can be omitted if you need to keep your timestamps as strings, though you’ll need a few more steps (starting with out == max(out) instead of which.max, but then you need to de-duplicate it);
  • in order to get the latest time per day, it will help to group on the date (as well as ID), so we generate a date-only variant of out;
  • group by both date and ID, find which is the first of the max(out) by way of which.max
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading