Subsetting on a column with NA returns a whole row of NA. I know there are multiple ways to avoid this; my question is why does this happen at all? For example:
> d<-data.frame(a = 1:3, b = c(NA, 2, 5))
> d[d$b == 2,]
a b
NA NA NA
2 2 2
I would understand if it simply returned row 1 also, but it returns a whole row of NA which never existed in the object I subsetted. This seems strange and unhelpful, and I can’t find an explanation of why this behavior exists (again, I know how to prevent it).
>Solution :
It is unintuitive indeed, but if you check d$b == 2 you see that:
> d$b == 2
#[1] NA TRUE FALSE
And when you subset a row with NA, it adds a NA row:
> d[c(NA, 2), ]
# a b
#NA NA NA
#2 2 2
d[d$b == 2, ] cannot return the first row, since the first value of d$b == 2 should be 1, and here it is NA.