Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unable to write a correct user defined function in dplyr for outlier treatment in r

I am trying to write a function to fix outliers in variables but getting errors when writing in dplyr form.

fn_outlier_fix <- function(x, df){
  x = enquo(x)
  
  Q1 = df %>% pull(!!x) %>% quantile(0.25) %>% unname()
  Q3 = df %>% pull(!!x) %>% quantile(0.75) %>% unname()
  IQR = Q3 - Q1
  UC = Q3 + (1.5 * IQR)
  LC = Q3 - (1.5 * IQR)
  
  df <- df %>% 
    mutate(!!x := if_else(x > UC,UC,!!x),
           !!x := if_else(x < LC,LC,!!x))
}
library(dplyr)

df_test <- tribble(
  ~sales, ~var1, ~var2,
  22, 230.1,  37.8,
  10, 44.5,  39.3,
  9,  17.2,  45.9,
  19, 151.5,  41.3,
  13, 180.8,  10.8,
  7,  8.7,    48.9,
  12, 57.5,   32.8,
  13, 120.2,  19.6,
  5,  8.6,    2.1,
  11, 199.8,  2.6)
fn_outlier_fix(x = var1, df = df_test)

Error:

Error in `mutate()`:
! Problem while computing `var1 = if_else(x > UC, UC, var1)`.
Caused by error in `if_else()`:
! Base operators are not defined for quosures. Do you need to unquote the quosure?

# Bad: myquosure > rhs

# Good: !!myquosure > rhs
Backtrace:
 1. global fn_outlier_fix(x = var1, df = df_test)
 9. rlang:::Ops.quosure(x, UC)

I don’t know why its so complicated in r dplyr to write functions in comparison to Python.
I was able to manage write the function in below form that worked but I still want the above code to work for my understanding. Appreciate any help.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Where as below code in base R works

fn_outlier_fix <- function(x){
  
  Q1 = quantile(x, 0.25)
  Q3 = quantile(x, 0.75)
  IQR = Q3 - Q1
  UC = Q3 + (1.5 * IQR)
  LC = Q3 - (1.5 * IQR)
  
  x[x > UC] <- UC
  x[x < LC] <- LC
  
  x <- x
}

>Solution :

You were nearly there, you’ve just forgotten to unquote the x in the if_else statement. This function works:

fn_outlier_fix <- function(x, df){
  x = enquo(x)
  
  Q1 = df %>% pull(!!x) %>% quantile(0.25) %>% unname()
  Q3 = df %>% pull(!!x) %>% quantile(0.75) %>% unname()
  IQR = Q3 - Q1
  UC = Q3 + (1.5 * IQR)
  LC = Q3 - (1.5 * IQR)
  
  df <- df %>% 
    mutate(!!x := if_else(!!x > UC,UC,!!x),
           !!x := if_else(!!x < LC,LC,!!x))
  
  df
}

The reason why writing functions for dplyr is so complicated is due to the non standard evaluation it uses to access the variable names. There is a complete vignette about programming with dplyr.

They’ve changed the recommend way again how to work with NSE in dplyr, now best practise would look like:

fn_outlier_fix_2 <- function(x, df){
  
  Q1 = df %>% pull({{x}}) %>% quantile(0.25) %>% unname()
  Q3 = df %>% pull({{x}}) %>% quantile(0.75) %>% unname()
  IQR = Q3 - Q1
  UC = Q3 + (1.5 * IQR)
  LC = Q3 - (1.5 * IQR)
  
  df <- df %>% 
    mutate({{x}} := if_else({{x}} > UC,UC,{{x}}),
           {{x}} := if_else({{x}} < LC,LC,{{x}}))
  
  df
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading