Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

trying to group_by and then summarize max and min – running into error for unambiguous format

I have addresses that have duplicated information from Kingwood and Humble addresses. I am trying to combine these entries, preserving the the minimum first reported date, and the max last reported dates, using this code:

df <- df %>% group_by(id, street) %>% 
  summarise(firstReportedDate = min(as.Date(firstReportedDate))) %>% 
  summarise(lastReportedDate = max(as.Date(lastReportedDate)))

However, for some reason, id == 1000 is giving me the error:

Error: Problem with `summarise()` column `firstReportedDate`.
i `firstReportedDate = min(as.Date(firstReportedDate))`.
x character string is not in a standard unambiguous format
i The error occurred in group 3: id = "1000", street = "Po Box 203"

Can anyone help me understand this error? Sample of data below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

dput(df)
structure(list(street = c("2200 Lake Village Dr", "1040 Marina Dr", 
"2200 Lake Village Dr", "1040 Marina Dr", "22302 Rustic Bridge Ln", 
"22302 Rustic Bridge Ln", "1060 Marina Dr", "3211 Laurel Point Ct", 
"Po Box 203", "19703 Highway 59 N", "6714 Dorylee Ln", "3511 Forest Row Dr", 
"3511 Forest Row Dr", "Acorn Ln"), city = c("Kingwood", "Humble", 
"Kingwood", "Kingwood", "Kingwood", "Humble", "Humble", "Kingwood", 
"Humble", "Humble", "Humble", "Kingwood", "Humble", "Humble"), 
    state = c("TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX", 
    "TX", "TX", "TX", "TX", "TX", "TX"), zip = c("77339", "77339", 
    "77339", "77339", "77339", "77339", "77339", "77339", "77347", 
    "77338", "77396", "77345", "77345", "77345"), firstReportedDate = c("5/25/2019", 
    "1/1/2015", "9/30/2017", "11/30/2015", "10/18/2017", "6/15/2017", 
    "9/30/2009", "10/12/2002", "9/22/2017", "1/1/2009", "3/5/2004", 
    "4/8/2012", "9/30/2009", "1/1/2009"), lastReportedDate = c("4/1/2022", 
    "1/1/2021", "9/30/2017", "11/30/2015", "4/1/2022", "6/15/2018", 
    "9/30/2009", "3/3/2004", "4/1/2022", "1/1/2011", "3/5/2004", 
    "4/1/2022", "9/30/2009", "1/1/2013"), id = c("357", "357", 
    "357", "357", "359", "359", "359", "359", "1000", "1000", 
    "1000", "1431", "1431", "1431")), row.names = c(NA, -14L), class = c("tbl_df", 
"tbl", "data.frame"))

>Solution :

Embed everything in the same summarise call. Also, you should specify the format of your date in the format argument of as.Date when your data is not in the international date format.

dat %>% 
  mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>% 
  group_by(id, street) %>% 
  summarise(firstReportedDate = min(firstReportedDate),
            lastReportedDate = max(lastReportedDate))

output

# A tibble: 10 Ă— 4
# Groups:   id [4]
   id    street                 firstReportedDate lastReportedDate
   <chr> <chr>                  <date>            <date>          
 1 1000  19703 Highway 59 N     2009-01-01        2011-01-01      
 2 1000  6714 Dorylee Ln        2004-03-05        2004-03-05      
 3 1000  Po Box 203             2017-09-22        2022-04-01      
 4 1431  3511 Forest Row Dr     2009-09-30        2022-04-01      
 5 1431  Acorn Ln               2009-01-01        2013-01-01      
 6 357   1040 Marina Dr         2015-01-01        2021-01-01      
 7 357   2200 Lake Village Dr   2017-09-30        2022-04-01      
 8 359   1060 Marina Dr         2009-09-30        2009-09-30      
 9 359   22302 Rustic Bridge Ln 2017-06-15        2022-04-01      
10 359   3211 Laurel Point Ct   2002-10-12        2004-03-03      
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading