Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get the amount of unique ID's for which a variable is not completely NA

I want to figure out how many unique NR have all C_P values NA.

DT <- structure(list(NR = c(10001111, 10001111, 10001113, 10001114, 
10001115), C_P = c("8851", "NA", "8873", "NA", "NA"
),        B_LAND = c("NL", "NL", "NL", "NL", "NL")), row.names = c(NA, 
-5L), class = c("data.table", "data.frame"))

         NR  C_P B_LAND
1: 10001111 8851     NL
2: 10001111   NA     NL
3: 10001113 8873     NL
4: 10001114   NA     NL
5: 10001115   NA     NL

I am struggling to get the syntax right. I attempted;

DT[, .(uniqueNR_without_C_P = uniqueN(is.na(C_P)), by = NR]

The desired output is 2, since there are two unique NR, for which there is no C_P.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Usually you could do:

DT[, all(is.na(C_P)), NR][, sum(V1)]

But since there no NA value in your data but the characther "NA" you can do smth like:

is_string.NA = function(x) x == "NA"
DT[, all(is_string.NA(C_P)), NR][, sum(V1)]

Alternatively:

uniqueN(DT$NR)  - uniqueN(DT[!is_string.NA(C_P)]$NR)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading