Home Delete rows with duplicate values and include NAs as individual values

Questions

Delete rows with duplicate values and include NAs as individual values

January 3, 2023

I have a df like this:

testdf <- structure(list(POS = c(37, 44, 50, 83), Col1 = c("A", "C", NA, 
"G"), Col2 = c("A", NA, "T", "C")), class = "data.frame", row.names = c(NA, 
-4L))

which looks like that:

     POS  Col1 Col2
[1,] "37" "A"  "A" 
[2,] "44" "C"  NA  
[3,] "50" NA   "T" 
[4,] "83" "G"  "C"

And i would like to exclude all rows which are the same between Col1 and Col2 (that means only row 1). Unfortunately i do not know how to deal with the NAs. When i try

testdf[testdf$Col1 != testdf$Col2,]

it does not consider NAs as an own entry?

The expected output should be:

     POS  Col1 Col2
[1,] "44" "C"  NA  
[2,] "50" NA   "T" 
[3,] "83" "G"  "C"

I would rather not transform NAs into something else.

testdf %>%
  rowwise %>%
  filter(Col1 != Col2)

Is also not working correctly.

>Solution :

You can add is.na() to your filter condition.

You should also handle the case where both columns are NA; I added a row like this to your example data. If you want to keep these rows, then:

library(dplyr)

testdf %>%
  filter(is.na(Col1) | is.na(Col2) | Col1 != Col2)

  POS Col1 Col2
1  44    C <NA>
2  50 <NA>    T
3  83    G    C
4  99 <NA> <NA>

If you want to remove them, use xor() instead of |:

testdf %>%
  filter(xor(is.na(Col1), is.na(Col2)) |Col1 != Col2)

  POS Col1 Col2
1  44    C <NA>
2  50 <NA>    T
3  83    G    C

byMR

Published January 03, 2023

Add a comment

Open last saved CSV excel file via python

byMR

January 3, 2023

Questions

How can i display date in dd–mm-yyyy format with a leading zero when day or month is lesser than 10?

byMR

January 3, 2023

Questions

Django : Only display model items that match with for each category

byMR

January 3, 2023

Questions

Different behavior of apply(str) and astype(str) for datetime64[ns] pandas columns

byMR

January 3, 2023

Questions

Nest can't resolve dependencies

byMR

January 3, 2023

Questions

need help in nested array concatenaton/appending operation concept

byMR

January 3, 2023

Delete rows with duplicate values and include NAs as individual values

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Open last saved CSV excel file via python

How can i display date in dd–mm-yyyy format with a leading zero when day or month is lesser than 10?

Django : Only display model items that match with for each category

Different behavior of apply(str) and astype(str) for datetime64[ns] pandas columns

Nest can't resolve dependencies

need help in nested array concatenaton/appending operation concept

Keep Up to Date with the Most Important News

Delete rows with duplicate values and include NAs as individual values

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Open last saved CSV excel file via python

How can i display date in dd–mm-yyyy format with a leading zero when day or month is lesser than 10?

Django : Only display model items that match with for each category

Different behavior of apply(str) and astype(str) for datetime64[ns] pandas columns

Nest can't resolve dependencies

need help in nested array concatenaton/appending operation concept

Discover more from Dev solutions