I have a data sheet with 40 data columns (40 different nutrients), with additional columns for plot numbers and factors. I would like to automatically loop through each column name and produce a linear model and summary for each. The data columns begin at column 10.
for(i in 10:ncol(df)) { # for-loop over columns
mod2<-aov(i~block+tillage*residue+Error(subblock),data=df)
summary(mod2)
}
This is currently producing the error Error in model.frame.default(formula = i ~ subblock, data = df, drop.unused.levels = TRUE) : variable lengths differ (found for 'subblock')
Variable lengths are consistent so I imagine I am looping incorrectly.
The data looks similar to below (with more categorical columns at the start), with the nutrient columns beginning at column 10.
block | tillage | residue | subblock | nutrient 1 | nutrient 2 | etc. |
---|---|---|---|---|---|---|
b1 | NT | NR | s1 | 0.5 | 0.6 |
>Solution :
In general it is helpful to post a sample of your data using dput()
. In the absence of that I am going to use the built in dataset mtcars
to show you how it is possible to do what you are doing with formula()
:
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Select columns
desired_columns <- names(mtcars)[!names(mtcars)=="mpg"]
for (column in desired_columns){
this_formula = formula(paste("mpg ~ ", column))
print(summary(lm(this_formula, data = mtcars)))
}
This will output lm(mpg ~ var)
for each var
in the data. The key is the paste()
statement which builds the expression into a string, and then formula()
makes it into a formula object Hopefully you can see how this can be applied to your data.