Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove duplicate rows in R based on condition?

I have the following data:

df <- data.frame(id = c("001", "001", "001", "002", "002", "003", "003"),
                 x = c(0, 0, 0, 0, 1, 0, 1))

 id x
001 0
001 0
001 0
002 0
002 1
003 0
003 1

The nature of the data is such that it is possible for some id to only have x = 0 rows. In the case where x = 1 for a given id, it only occurs once, and that too in the last row for that id. I want to remove duplicate rows for each id, but in case x = 1 for an id, I want to keep only that row.

The desired output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 id x
001 0
002 1
003 1

A tidyverse solution is preferable. Thanks!

>Solution :

Probably slice_max

df %>%
    slice_max(x, by = id) %>%
    distinct()

or (as comments from @r2evans)

df %>%
    slice_max(x, by = id, with_ties = FALSE)

which gives

   id x
1 001 0
2 002 1
3 003 1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading