Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Deleting rows that are duplicated in one column based on value in another column

A similar question was asked here. However, I did not manage to adopt that solution to my particular problem, hence the separate question.

An example dataset:


  id group
1  1   5
2  1 998
3  2   2
4  2   3
5  3 998

I would like to delete all rows that are duplicated in id and where group has value 998.
In this example, only row 2 should be deleted.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried something along those lines:

df1 <- df %>%
  subset((unique(by = "id") |  group != 998))

but got

Error in is.factor(x) : Argument "x" is missing, with no default

Thank you in advance

>Solution :

Here is an idea

library(dplyr)

df %>% 
 group_by(id) %>% 
 filter(!any(n() > 1 & group == 998))

# A tibble: 3 x 2
# Groups:   id [2]
     id group
  <int> <int>
1     2     2
2     2     3
3     3   998
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading