Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split values of a data.table column and select the lowest number or NA if all are "NA"

I have the following data table in R:

library(data.table)
dt <- data.table(x = c(NA, NA, NA, NA, NA, NA, NA, "1,2", "NA,3", "NA,NA,NA",
                        "NA,NA,NA,NA", "NA,NA", "NA,4,NA,1", "NA,NA,NA", "NA,NA"))
              x
 1:        <NA>
 2:        <NA>
 3:        <NA>
 4:        <NA>
 5:        <NA>
 6:        <NA>
 7:        <NA>
 8:         1,2
 9:        NA,3
10:    NA,NA,NA
11: NA,NA,NA,NA
12:       NA,NA
13:   NA,4,NA,1
14:    NA,NA,NA
15:       NA,NA

Now I would like to transfer the column to numeric with the following conditions:

  • If the entry is already NA, keep it as NA
  • If all entries separated by commas are "NA", the field should be NA
  • If "NA" and numbers are mixed, the field should contain a single number with the lowest value
  • If all entries are numbers, select the single number with the lowest value

The result should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     x
 1: NA
 2: NA
 3: NA
 4: NA
 5: NA
 6: NA
 7: NA
 8:  1
 9:  3
10: NA
11: NA
12: NA
13:  1
14: NA
15: NA

>Solution :

You can achieve this by splitting the column values on the comma, converting the resulting strings to numbers, and then applying the min function.

For this, I’ve wrote custom function like this

# Function to compute the desired transformation
selectMinorNA <- function(x) {
  # If the entry is already NA, keep it as NA
  if (is.na(x))
    return(NA_integer_)
  # If all entries separated by commas are "NA", the field should be NA
  # I've added inf for the purpose of extracting the min value 
  nums <- as.numeric(gsub("NA", "Inf", unlist(strsplit(x, ","))))
  # If "NA" and numbers are mixed, the field should contain a single number with the lowest value
  if (all(is.infinite(nums)))
    return(NA_integer_)
  # If all entries are numbers, select the single number with the lowest value
  return(min(nums, na.rm = TRUE))
}

Applying this function to your data frame returns this

dt[, x := sapply(x, selectMinorNA)]
> dt
     x
 1: NA
 2: NA
 3: NA
 4: NA
 5: NA
 6: NA
 7: NA
 8:  1
 9:  3
10: NA
11: NA
12: NA
13:  1
14: NA
15: NA
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading