I have the following data table in R:
library(data.table)
dt <- data.table(x = c(NA, NA, NA, NA, NA, NA, NA, "1,2", "NA,3", "NA,NA,NA",
"NA,NA,NA,NA", "NA,NA", "NA,4,NA,1", "NA,NA,NA", "NA,NA"))
x
1: <NA>
2: <NA>
3: <NA>
4: <NA>
5: <NA>
6: <NA>
7: <NA>
8: 1,2
9: NA,3
10: NA,NA,NA
11: NA,NA,NA,NA
12: NA,NA
13: NA,4,NA,1
14: NA,NA,NA
15: NA,NA
Now I would like to transfer the column to numeric with the following conditions:
- If the entry is already NA, keep it as NA
- If all entries separated by commas are "NA", the field should be NA
- If "NA" and numbers are mixed, the field should contain a single number with the lowest value
- If all entries are numbers, select the single number with the lowest value
The result should look like this:
x
1: NA
2: NA
3: NA
4: NA
5: NA
6: NA
7: NA
8: 1
9: 3
10: NA
11: NA
12: NA
13: 1
14: NA
15: NA
>Solution :
You can achieve this by splitting the column values on the comma, converting the resulting strings to numbers, and then applying the min
function.
For this, I’ve wrote custom function like this
# Function to compute the desired transformation
selectMinorNA <- function(x) {
# If the entry is already NA, keep it as NA
if (is.na(x))
return(NA_integer_)
# If all entries separated by commas are "NA", the field should be NA
# I've added inf for the purpose of extracting the min value
nums <- as.numeric(gsub("NA", "Inf", unlist(strsplit(x, ","))))
# If "NA" and numbers are mixed, the field should contain a single number with the lowest value
if (all(is.infinite(nums)))
return(NA_integer_)
# If all entries are numbers, select the single number with the lowest value
return(min(nums, na.rm = TRUE))
}
Applying this function to your data frame returns this
dt[, x := sapply(x, selectMinorNA)]
> dt
x
1: NA
2: NA
3: NA
4: NA
5: NA
6: NA
7: NA
8: 1
9: 3
10: NA
11: NA
12: NA
13: 1
14: NA
15: NA