Home In R, how can I validate whether values are valid according to an external table?

Questions

In R, how can I validate whether values are valid according to an external table?

June 10, 2022

I have a reference table of acceptable column values, represented as good_df below. I want to use it to find out which values in the previously unseen test_df are not among the list of acceptable ones.

library(tidyverse)
good_df <- tribble(
  ~column, ~value,
  "col1", "a",
  "col1", "b",
  "col1", "c",
  "col1", "d",
  "col1", "e",
  "col2", "A",
  "col2", "B",
  "col2", "C"
)

set.seed(1)
test_df <- tibble(
  col1 = sample(letters[1:3], 4, T),
  col2 = c(sample(LETTERS[1:3], 3, T), "D"),  # The D is invalid.
  col3 = rnorm(4)
)
test_df
#> # A tibble: 4 × 3
#>   col1  col2    col3
#>   <chr> <chr>  <dbl>
#> 1 a     A      0.330
#> 2 c     C     -0.820
#> 3 a     C      0.487
#> 4 b     D      0.738

^{Created on 2022-06-09 by the reprex package (v2.0.1)}

My thought is to use pivot_longer to convert test_df to a format that matches good_df, and then use setdiff to see what remains after the valid rows are removed, which must be the invalid ones. Something along the lines of this below, producing expected_output.

long_test_df <- pivot_longer(test_df, "MAGIC HAPPENS HERE") 

long_test_df %>% 
  select(column, value) %>% 
  setdiff(good_df)

expected_output <- tribble(
  ~column, ~value,
  "col2", "D",
)

I can’t seem to get pivot_longer to work as I’m intending it. Maybe it’s not designed to? I also thought of using the validate package, but I don’t see anything in the documentation about explicitly listing allowed or forbidden values in creating the rules, unless I missed something.

Also of course, in practice I have many more acceptable column/value pairs and many more columns in my data sets to validate.

I am open to alternatives to my approach and to alternative forms of storing the set of allowed column values. How may I achieve this? (Bonus points for pretty pipes and fewer named variables!)

Edit – Solution:

A modified version of Jonathan’s answer gives the fully general solution. (Also, I still prefer setdiff to anti_join here, although that’s just a personal preference.)

test_df %>% 
  pivot_longer(cols = any_of(good_df$column), names_to = "column", values_to = "value") %>% 
  select(column, value) %>% 
  anti_join(good_df)

>Solution :

Here is the code for pivot_longer, the missing values can be obtained with anti_join:

library(tidyverse)

test_df %>% 
  pivot_longer(c(col1, col2), names_to = "column", values_to = "value") %>% 
  anti_join(good_df)

tidyr

byMR

Published June 10, 2022

Add a comment

How to spread for loop target list across multiple lines

byMR

June 10, 2022

Questions

Identify the columns which contain zero and output its location

byMR

June 10, 2022

Questions

QTextEdit alignment fails on pasting text

byMR

June 10, 2022

Questions

How do I call a function onclick, while also changing the value of the text outside of the function

byMR

June 10, 2022

Questions

add text at the end of specific line

byMR

June 10, 2022

Questions

Pulling numbers between white space from email body into google sheets

byMR

June 10, 2022

In R, how can I validate whether values are valid according to an external table?

MEDevel.com: Open-source for Healthcare and Education

Edit – Solution:

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to spread for loop target list across multiple lines

Identify the columns which contain zero and output its location

QTextEdit alignment fails on pasting text

How do I call a function onclick, while also changing the value of the text outside of the function

add text at the end of specific line

Pulling numbers between white space from email body into google sheets

Keep Up to Date with the Most Important News

In R, how can I validate whether values are valid according to an external table?

MEDevel.com: Open-source for Healthcare and Education

Edit – Solution:

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to spread for loop target list across multiple lines

Identify the columns which contain zero and output its location

QTextEdit alignment fails on pasting text

How do I call a function onclick, while also changing the value of the text outside of the function

add text at the end of specific line

Pulling numbers between white space from email body into google sheets

Discover more from Dev solutions