I have a dataframe where I would like to go through all columns that end with _qc and if the value is “4”, then set NA to the corresponding column without the _qc suffix.
For example, if the value of a column named chla_adjusted_qc == 4, then, set the value of chla_adjusted to NA.
library(tidyverse)
df <- tibble(
chla_adjusted = c(100, 2),
chla_adjusted_qc = c("4", "1"),
bbp_adjusted = c(0.1, 9999),
bbp_adjusted_qc = c("2", "4")
)
df
#> # A tibble: 2 × 4
#> chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#> <dbl> <chr> <dbl> <chr>
#> 1 100 4 0.1 2
#> 2 2 1 9999 4
The desired output would be
tibble(
chla_adjusted = c(NA, 2),
chla_adjusted_qc = c("4", "1"),
bbp_adjusted = c(0.1, NA),
bbp_adjusted_qc = c("2", "4")
)
#> # A tibble: 2 × 4
#> chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#> <dbl> <chr> <dbl> <chr>
#> 1 NA 4 0.1 2
#> 2 2 1 NA 4
What I have done so far was to grab the current column name and find the corresponding column in which I want to set the NA value.
df |>
mutate(across(ends_with("_qc"), \(var) {
# If var is chla_adjusted_qc, then lets modify the value in chla_adjusted
col <- str_remove(cur_column(), "_qc")
# if (var == "4") {
# # What to do here?
# }
}))
#> # A tibble: 2 × 4
#> chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#> <dbl> <chr> <dbl> <chr>
#> 1 100 chla_adjusted 0.1 bbp_adjusted
#> 2 2 chla_adjusted 9999 bbp_adjusted
Thank you.
Created on 2022-12-20 with reprex v2.0.2
>Solution :
df %>%
mutate(across(ends_with("_qc"),
~ replace(cur_data()[[ sub("_qc$", "", cur_column()) ]], . == 4L, NA),
.names = "{sub('_qc$', '', .col)}"))
# # A tibble: 2 × 4
# chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
# <dbl> <chr> <dbl> <chr>
# 1 NA 4 0.1 2
# 2 2 1 NA 4