Error when using nls for positive coefficient constraint

July 27, 2022

I’m trying to run a regression with a constraint to set all coefficients greater than zero. To do this, I am utilizing the nls function. However, I am having an error:

"Error in nls(formula = y ~ . – 1, data = X, start = low, lower = low, :
parameters without starting value in ‘data’: ."

I believe everything is correct here, I tried to set a lower and upper bound on all variables, so I am not sure what is wrong.

Attempt 1:

library(magrittr)
X <- data.frame(
    x1 = seq(10),
    x2 = seq(10),
    x3 = seq(10),
    x4 = seq(10),
    x5 = seq(10),
    y = seq(10)
)

low <- dplyr::select(X, -y) %>% names %>% lapply( function(e) 0)
up <-  dplyr::select(X, -y) %>% names %>% lapply( function(e) Inf)
names(low) <- dplyr::select(X, -y) %>% names -> names(up)


fit1 <- nls(formula = y ~ . -1 , data = X,
    start = low,
    lower = low,
    upper = up,
    algorithm = "port"
)

Attempt 2:
Here I try to set the formula manually but then I get a new error:
"Error in qr(.swts * gr) :
dims [product 5] do not match the length of object [10]"

library(magrittr)
X <- data.frame(
    x1 = seq(10),
    x2 = seq(10),
    x3 = seq(10),
    x4 = seq(10),
    x5 = seq(10),
    y = seq(10)
)

n <- X %>% dplyr::select( -y ) %>% names %>% paste0( collapse = " + " )
f <- "y ~ %s -1" %>% sprintf( n ) %>% as.formula

low <- dplyr::select(X, -y) %>% names %>% lapply( function(e) 0)
up <-  dplyr::select(X, -y) %>% names %>% lapply( function(e) Inf)
names(low) <- dplyr::select(X, -y) %>% names -> names(up)


fit1 <- nls(formula = f , data = X,
    start = low,
    lower = low,
    upper = up,
    algorithm = "port"
)

How can I fix this? Thanks!

>Solution :

1) There are several problems here:

nls does not use the same formula notation as lm. Have fixed below.
the example does not have identifiable parameters, i.e. they are not unique so the calculation will fail. Below we change the example.
although 0 starting values seem to work here in general numeric optimization with constraints tends to work better if the starting values are in the interior of the feasible region.

Using the above we have

set.seed(123)
X <- data.frame(
    x1 = rnorm(10),
    x2 = rnorm(10),
    x3 = rnorm(10),
    x4 = rnorm(10),
    x5 = rnorm(10),
    y = rnorm(10)
)

fo <- y ~ b1 * x1 + b2 * x2 + b3 * x3 + b4 * x4 + b5 * x5
st <- c(b1 = 1, b2 = 1, b3 = 1, b4 = 1, b5 = 1)
nls(fo, X, start = st, lower = numeric(5), algorithm = "port")

giving:

Nonlinear regression model
  model: y ~ b1 * x1 + b2 * x2 + b3 * x3 + b4 * x4 + b5 * x5
   data: X
    b1     b2     b3     b4     b5 
0.0000 0.1222 0.0000 0.2338 0.1457 
 residual sum-of-squares: 6.477

Algorithm "port", convergence message: relative convergence (4)

2) The nnls (non-negative least squares) package can do this directly. We use X defined in (1).

nnls(as.matrix(X[-6]), X$y)

giving the following

Nonnegative least squares model
x estimates: 0 0.1221646 0 0.2337857 0.1457373 
residual sum-of-squares: 6.477
reason terminated: The solution has been computed sucessfully.