Split values of a data.table column and select the lowest number or NA if all are "NA"

I have the following data table in R:

library(data.table)
dt <- data.table(x = c(NA, NA, NA, NA, NA, NA, NA, "1,2", "NA,3", "NA,NA,NA",
                        "NA,NA,NA,NA", "NA,NA", "NA,4,NA,1", "NA,NA,NA", "NA,NA"))
              x
 1:        <NA>
 2:        <NA>
 3:        <NA>
 4:        <NA>
 5:        <NA>
 6:        <NA>
 7:        <NA>
 8:         1,2
 9:        NA,3
10:    NA,NA,NA
11: NA,NA,NA,NA
12:       NA,NA
13:   NA,4,NA,1
14:    NA,NA,NA
15:       NA,NA

Now I would like to transfer the column to numeric with the following conditions:

  • If the entry is already NA, keep it as NA
  • If all entries separated by commas are "NA", the field should be NA
  • If "NA" and numbers are mixed, the field should contain a single number with the lowest value
  • If all entries are numbers, select the single number with the lowest value

The result should look like this:

     x
 1: NA
 2: NA
 3: NA
 4: NA
 5: NA
 6: NA
 7: NA
 8:  1
 9:  3
10: NA
11: NA
12: NA
13:  1
14: NA
15: NA

>Solution :

You can achieve this by splitting the column values on the comma, converting the resulting strings to numbers, and then applying the min function.

For this, I’ve wrote custom function like this

# Function to compute the desired transformation
selectMinorNA <- function(x) {
  # If the entry is already NA, keep it as NA
  if (is.na(x))
    return(NA_integer_)
  # If all entries separated by commas are "NA", the field should be NA
  # I've added inf for the purpose of extracting the min value 
  nums <- as.numeric(gsub("NA", "Inf", unlist(strsplit(x, ","))))
  # If "NA" and numbers are mixed, the field should contain a single number with the lowest value
  if (all(is.infinite(nums)))
    return(NA_integer_)
  # If all entries are numbers, select the single number with the lowest value
  return(min(nums, na.rm = TRUE))
}

Applying this function to your data frame returns this

dt[, x := sapply(x, selectMinorNA)]
> dt
     x
 1: NA
 2: NA
 3: NA
 4: NA
 5: NA
 6: NA
 7: NA
 8:  1
 9:  3
10: NA
11: NA
12: NA
13:  1
14: NA
15: NA

Leave a Reply