Home How to remove duplicate rows in R based on condition?

Questions

How to remove duplicate rows in R based on condition?

August 4, 2023

I have the following data:

df <- data.frame(id = c("001", "001", "001", "002", "002", "003", "003"),
                 x = c(0, 0, 0, 0, 1, 0, 1))

 id x
001 0
001 0
001 0
002 0
002 1
003 0
003 1

The nature of the data is such that it is possible for some id to only have x = 0 rows. In the case where x = 1 for a given id, it only occurs once, and that too in the last row for that id. I want to remove duplicate rows for each id, but in case x = 1 for an id, I want to keep only that row.

The desired output:

A tidyverse solution is preferable. Thanks!

>Solution :

Probably slice_max

df %>%
    slice_max(x, by = id) %>%
    distinct()

or (as comments from @r2evans)

df %>%
    slice_max(x, by = id, with_ties = FALSE)

which gives

data-wrangling

byMR

Published August 04, 2023

Add a comment

Why this Powershell Regex for serial numbers returns everything with Select-String?

byMR

August 4, 2023

Questions

Converting a List into a formated pandas Dataframe

byMR

August 4, 2023

Questions

Fix range of matplotlib histogram with multiple datasets?

byMR

August 4, 2023

Questions

How can I (compliantly) read from a tmpfile()?

byMR

August 4, 2023

Questions

Random elements showing up on the array in c

byMR

August 4, 2023

Questions

An example of how pytorch clip_grad_norm_ works

byMR

August 4, 2023

How to remove duplicate rows in R based on condition?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Why this Powershell Regex for serial numbers returns everything with Select-String?

Converting a List into a formated pandas Dataframe

Fix range of matplotlib histogram with multiple datasets?

How can I (compliantly) read from a tmpfile()?

Random elements showing up on the array in c

An example of how pytorch clip_grad_norm_ works

Keep Up to Date with the Most Important News

How to remove duplicate rows in R based on condition?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Why this Powershell Regex for serial numbers returns everything with Select-String?

Converting a List into a formated pandas Dataframe

Fix range of matplotlib histogram with multiple datasets?

How can I (compliantly) read from a tmpfile()?

Random elements showing up on the array in c

An example of how pytorch clip_grad_norm_ works

Discover more from Dev solutions