Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Creating new dataset with only the most recent date in R

I am trying to create a new table where table1 is grouped by ID and only the most recent date is picked from table1$new.

I have the tables below:

1st Table:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ID Status Date
001 1003 2021-02-01
001 1003 2021-02-01
001 1003 2021-02-21
001 1003 2021-04-21
002 1003 2021-05-01
002 1003 2021-05-01
002 1003 2021-05-21
002 1003 2021-07-21
003 1004 2021-05-01
003 1004 2021-05-01
003 1004 2021-05-21
003 1004 2021-07-21

Current Output

ID Status Date
001 1003 2021-02-01
001 1003 2021-02-01
001 1003 2021-02-21
001 1003 2021-04-21
002 1003 2021-05-01
002 1003 2021-05-01
002 1003 2021-05-21
002 1003 2021-07-21

Desired Output:

ID New
001 2021-4-21
002 2021-07-21

Here is my code as of now

desired <- table1 %>% 
group_by(ID) %>% 
subset(Status == '1003', select = c("ID", "Date")) %>% rename(New = Date)

I think my first mistake is in the group_by and I am not sure how to only keep the most recent date. Any help would be appreciated.

>Solution :

I would use dplyr::slice_max() to select the most recent date.

library(tidyverse)
d <- tibble::tribble(
  ~ID, ~Status,        ~Date,
  "001",   1003L, "2021-02-01",
  "001",   1003L, "2021-02-01",
  "001",   1003L, "2021-02-21",
  "001",   1003L, "2021-04-21",
  "002",   1003L, "2021-05-01",
  "002",   1003L, "2021-05-01",
  "002",   1003L, "2021-05-21",
  "002",   1003L, "2021-07-21",
  "003",   1004L, "2021-05-01",
  "003",   1004L, "2021-05-01",
  "003",   1004L, "2021-05-21",
  "003",   1004L, "2021-07-21"
)

res <- d |> 
  mutate(Date = lubridate::ymd(Date)) |> 
  dplyr::filter(Status == 1003) |> 
  group_by(ID) |> 
  slice_max(Date, n = 1) |> 
  ungroup() |> 
  select(ID, New = Date)
res
#> # A tibble: 2 x 2
#>   ID    New       
#>   <chr> <date>    
#> 1 001   2021-04-21
#> 2 002   2021-07-21

Created on 2022-06-15 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading