Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Identify partial duplicate rows where one row is NA and its duplicate is non-NA

This is my first time posting on here, so please bear with me! In R, i am trying to identify rows that are partially duplicate, based on the following conditions:

  • for one row, col1 must be non-NA. it’s duplicate must be NA
  • col2 can be different, but all other cols must be the same

Example df:

df = read.table(text = "
col1    col2    col3    col4    col5
a   928     0   TRUE    1
b   1028    0   FALSE   1
c   394     1   TRUE    1
NA  239     4   TRUE    0
NA  102     0   TRUE    1", h = T)

you can see that row 1 and 5 fit the conditions i am looking for: non-NA and NA in col1, col2 is not identical but all other cols are identical. So I would call these rows partial duplicates.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want the resulting output df to have the duplicate rows (e.g. rows 1 and 5) only. desired output would look like:

df = read.table(text = "
col1    col2    col3    col4    col5
a   928     0   TRUE    1
NA  102     0   TRUE    1", h = T)

note this is just an example df and the actual df is much larger.

thanks!

>Solution :

Basically, what I’m doing below is checking if any col1 value is NA, and any is not NA, for each group of identical rows in the third to the last column. Hope this makes sense!

all_other_cols <- colnames(df) |> tail(-2)

dplyr::filter(df, any(is.na(col1)) && any(!is.na(col1)), .by = all_other_cols)

Output:

  col1 col2 col3 col4 col5
1    a  928    0 TRUE    1
2 <NA>  102    0 TRUE    1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading