Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a way to loop through column names (not numbers) in r for linear models?

I have a data sheet with 40 data columns (40 different nutrients), with additional columns for plot numbers and factors. I would like to automatically loop through each column name and produce a linear model and summary for each. The data columns begin at column 10.

for(i in 10:ncol(df)) {       # for-loop over columns
  mod2<-aov(i~block+tillage*residue+Error(subblock),data=df)
  summary(mod2)
}

This is currently producing the error Error in model.frame.default(formula = i ~ subblock, data = df, drop.unused.levels = TRUE) : variable lengths differ (found for 'subblock')
Variable lengths are consistent so I imagine I am looping incorrectly.

The data looks similar to below (with more categorical columns at the start), with the nutrient columns beginning at column 10.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

block tillage residue subblock nutrient 1 nutrient 2 etc.
b1 NT NR s1 0.5 0.6

>Solution :

In general it is helpful to post a sample of your data using dput(). In the absence of that I am going to use the built in dataset mtcars to show you how it is possible to do what you are doing with formula():

head(mtcars)

#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# Select columns
desired_columns  <- names(mtcars)[!names(mtcars)=="mpg"]

for (column in desired_columns){
    this_formula = formula(paste("mpg ~ ", column))
    print(summary(lm(this_formula, data = mtcars)))
}

This will output lm(mpg ~ var) for each var in the data. The key is the paste() statement which builds the expression into a string, and then formula() makes it into a formula object Hopefully you can see how this can be applied to your data.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading