Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use custom function with Apache Arrow in R?

I am trying to learn Apache Arrow with R. I can not find how to make user
defined function with Arrow.

library(arrow)
#> See arrow_info() for available features
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Simply return the average of a vector

f1 <- function(x) {
  
  x <- Array$create(x)
  
  res <- mean(x, na.rm = TRUE)
  
  return(as.vector(res))
}

If I try to use my f1 function, I am getting this warning and the result
is that the data is pulled in R before the computation.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ds <- arrow_table(head(mtcars, 6))

ds %>% 
  mutate(mpg2 = f1(mpg)) %>% 
  collect()
#> Warning: Expression f1(mpg) not supported in Arrow; pulling data into R
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb mpg2
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 20.5
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 20.5
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 20.5
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 20.5
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 20.5
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 20.5

Are there any way to use custom function within Arrow in R?

Created on 2022-03-18 by the reprex package (v2.0.1)

>Solution :

That appears to be the documented behaviour:

If you try to call a function which does not have arrow mapping, the data will be pulled back into R, and you will see a warning message.

Which makes some sense if you think about it as the ‘backend’ does not contain an embedded R interpreter so we probably cannot expect to send arbitrary functions down.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading