Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to turn formula to variables for use with fastLm function in R

I am trying to use function RcppArmadillo::fastLM instead of lm for performance reasons.
Here is my function call to lm

test_dt = structure(list(A= c(168.08, 166.65, 167.52, 167.16, 165.77, 
167.65, 169.84, 170.45, 171.29, 173.15, 174.12, 174.45, 174.18, 
172.92, 174.5, 173.94, 172.61, 168.74, 167.28, 167.12), `B` = c(1801.599976, 
1783, 1795.099976, 1788.699951, 1763.599976, 1793, 1816.400024, 
1827.400024, 1830.199951, 1847.599976, 1863.199951, 1867.900024, 
1866.099976, 1853.599976, 1869.699951, 1861, 1851.199951, 1806, 
1783.5, 1784.099976)), row.names = c(NA, -20L), class = c("data.table", 
"data.frame"))

coef(lm(A ~ B + 0,data = test_dt))[1]

> 0.0934728 

since most of the time is used by lm in interpreting formula, I do not want to use formula. Instead, I want to turn it into something –

RcppArmadillo::fastLM(X = test_dt$B + 0, y = test_dt$A)

but I am not sure how to add + 0 as shown in the formula.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried the following

library(data.table)
dt = copy(test_dt)
dt[, C := 0]
coef(RcppArmadillo::fastLm(X = dt[,2:3], y = dt[,1]))[[1]]

But this is giving error.

Error in fastLm.default(X = dt[, 2:3], y = dt[, 1]) : 
  (list) object cannot be coerced to type 'double'

Can someone show me the right way to turn formula A ~ B + 0 into variables X and y for use in fastLm function?

Here are the performance results.

  microbenchmark::microbenchmark(
  formula = coef(lm(A ~ B, test_dt)),
  no_formula = with(test_dt, coef(fastLm(cbind(1, B), A))),
  times = 100)
Unit: microseconds
       expr      min        lq      mean   median        uq      max neval cld
    formula 1168.819 1185.3455 1208.6552 1204.581 1217.8985 1618.640   100   b
 no_formula  209.126  219.3785  243.3773  225.395  234.2235 1746.299   100  a 

>Solution :

The first argument of the default method of fastLm is the model matrix. It should have a column of 1’s to represent the intercept and if it does not then there is no intercept.

These give the same answer using no intercept:

coef(lm(A ~ B + 0, test_dt))[1]
with(test_dt, coef(fastLm(B, A)))

and these give the same answer using an intercept:

coef(lm(A ~ B, test_dt))
with(test_dt, coef(fastLm(cbind(1, B), A)))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading