I need to remove specific text/values from all cells it is present within a dataset. This should be the equivalent of the ‘Find and Replace All’ tool within Excel. This cannot be simply removing/omitting all ‘NA’ values as several entries are NA and need to remain such, thus meaning it needs to be removing "NA |". This also cannot be limited to just the columns listed below and must be applied to the entire data frame, as there are 20+ additional columns to the right of V6.
Input:
V1 V2 V3 V4 V5 V6
NA|TR1000 NA|chr1 NA|1000 1200 A +
TR1000 chr1 1100 NA|1200 B +
NA|TR1000 NA|chr1 NA|1000 NA C +
TR2000 NA|chr2 2000 NA D +
TR2000 NA|chr2 2100 2500 E +
TR3000 NA 3000 3500 F +
NA|TR3000 chr3 NA|3000 3500 F +
I Need:
V1 V2 V3 V4 V5 V6
TR1000 chr1 1000 1200 A +
TR1000 chr1 1100 1200 B +
TR1000 chr1 1000 NA C +
TR2000 chr2 2000 NA D +
TR2000 chr2 2100 2500 E +
TR3000 NA 3000 3500 F +
TR3000 chr3 3000 3500 F +
>Solution :
This should do it:
df[] = lapply(df, sub, pattern = "NA|", replacement = "", fixed = TRUE)
Demo:
df = read.table(text = 'V1 V2 V3 V4 V5 V6
NA|TR1000 NA|chr1 NA|1000 1200 A +
TR1000 chr1 1100 NA|1200 B +
NA|TR1000 NA|chr1 NA|1000 NA C +
TR2000 NA|chr2 2000 NA D +
TR2000 NA|chr2 2100 2500 E +
TR3000 NA 3000 3500 F +
NA|TR3000 chr3 NA|3000 3500 F + ', header = T)
df[] = lapply(df, sub, pattern = "NA|", replacement = "", fixed = TRUE)
df
# V1 V2 V3 V4 V5 V6
# 1 TR1000 chr1 1000 1200 A +
# 2 TR1000 chr1 1100 1200 B +
# 3 TR1000 chr1 1000 <NA> C +
# 4 TR2000 chr2 2000 <NA> D +
# 5 TR2000 chr2 2100 2500 E +
# 6 TR3000 <NA> 3000 3500 F +
# 7 TR3000 chr3 3000 3500 F +
If you like using dplyr, here’s a method there:
df %>%
mutate(across(everything(), sub, pattern = "NA|", replacement = "", fixed = TRUE))