I’m trying to run a regression with a constraint to set all coefficients greater than zero. To do this, I am utilizing the nls function. However, I am having an error:
"Error in nls(formula = y ~ . – 1, data = X, start = low, lower = low, :
parameters without starting value in ‘data’: ."
I believe everything is correct here, I tried to set a lower and upper bound on all variables, so I am not sure what is wrong.
Attempt 1:
library(magrittr)
X <- data.frame(
x1 = seq(10),
x2 = seq(10),
x3 = seq(10),
x4 = seq(10),
x5 = seq(10),
y = seq(10)
)
low <- dplyr::select(X, -y) %>% names %>% lapply( function(e) 0)
up <- dplyr::select(X, -y) %>% names %>% lapply( function(e) Inf)
names(low) <- dplyr::select(X, -y) %>% names -> names(up)
fit1 <- nls(formula = y ~ . -1 , data = X,
start = low,
lower = low,
upper = up,
algorithm = "port"
)
Attempt 2:
Here I try to set the formula manually but then I get a new error:
"Error in qr(.swts * gr) :
dims [product 5] do not match the length of object [10]"
library(magrittr)
X <- data.frame(
x1 = seq(10),
x2 = seq(10),
x3 = seq(10),
x4 = seq(10),
x5 = seq(10),
y = seq(10)
)
n <- X %>% dplyr::select( -y ) %>% names %>% paste0( collapse = " + " )
f <- "y ~ %s -1" %>% sprintf( n ) %>% as.formula
low <- dplyr::select(X, -y) %>% names %>% lapply( function(e) 0)
up <- dplyr::select(X, -y) %>% names %>% lapply( function(e) Inf)
names(low) <- dplyr::select(X, -y) %>% names -> names(up)
fit1 <- nls(formula = f , data = X,
start = low,
lower = low,
upper = up,
algorithm = "port"
)
How can I fix this? Thanks!
>Solution :
1) There are several problems here:
- nls does not use the same formula notation as lm. Have fixed below.
- the example does not have identifiable parameters, i.e. they are not unique so the calculation will fail. Below we change the example.
- although 0 starting values seem to work here in general numeric optimization with constraints tends to work better if the starting values are in the interior of the feasible region.
Using the above we have
set.seed(123)
X <- data.frame(
x1 = rnorm(10),
x2 = rnorm(10),
x3 = rnorm(10),
x4 = rnorm(10),
x5 = rnorm(10),
y = rnorm(10)
)
fo <- y ~ b1 * x1 + b2 * x2 + b3 * x3 + b4 * x4 + b5 * x5
st <- c(b1 = 1, b2 = 1, b3 = 1, b4 = 1, b5 = 1)
nls(fo, X, start = st, lower = numeric(5), algorithm = "port")
giving:
Nonlinear regression model
model: y ~ b1 * x1 + b2 * x2 + b3 * x3 + b4 * x4 + b5 * x5
data: X
b1 b2 b3 b4 b5
0.0000 0.1222 0.0000 0.2338 0.1457
residual sum-of-squares: 6.477
Algorithm "port", convergence message: relative convergence (4)
2) The nnls (non-negative least squares) package can do this directly. We use X defined in (1).
nnls(as.matrix(X[-6]), X$y)
giving the following
Nonnegative least squares model
x estimates: 0 0.1221646 0 0.2337857 0.1457373
residual sum-of-squares: 6.477
reason terminated: The solution has been computed sucessfully.