I would like to replace entries in a dataframe elementwise on the condition that they do not belong to a set of valid possible entries. This is because the other (string) entries in the dataframe are not known ahead of time.
If I attempt to assign NA to the subset df[!(df %in% valid_entries)], all entries in the dataframe are replaced with NA, as opposed to only the elements that satisfy the condition (the kind of behaviour I would expect if I were dealing with a matrix as opposed to a data.frame).
How can I achieve the desired behaviour with my data.frame ideally not using functions outside base R?
set.seed(123); N <- 100; valid_entries <- c("GOOD", "BAD")
df <- data.frame(A = sample(valid_entries, N, TRUE, c(0.4, 0.6)),
B = sample(valid_entries, N, TRUE, c(0.7, 0.3)))
df[2, 2] <- "Missing"
df[3, 1] <- "NotAvailable"
head(df)
# %in% does not work -> Replaces all with NA
df[!(df %in% valid_entries)] <- NA
head(df, n = 4)
# A B
# 1 NA NA
# 2 NA NA
# 3 NA NA
# 4 NA NA
>Solution :
You might need to apply over the columns:
df[apply(df, 2, \(x) !x %in% valid_entries)] <- NA
output
> head(df)
A B
1 BAD GOOD
2 GOOD <NA>
3 <NA> GOOD
4 GOOD BAD
5 GOOD GOOD
6 BAD BAD
Note: \ can replace function in lambda-like functions since R 4.1.