Let’s say I’m working with the iris dataset in R:
data(iris)
summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. : 4.300 Min. : 2.000 Min. : 1.000 Min. : 0.100
1st Qu.: 5.100 1st Qu.: 2.800 1st Qu.: 1.600 1st Qu.: 0.300
Median : 5.800 Median : 3.000 Median : 4.350 Median : 1.300
Mean : 5.843 Mean : 3.057 Mean : 3.758 Mean : 1.199
3rd Qu.: 6.400 3rd Qu.: 3.300 3rd Qu.: 5.100 3rd Qu.: 1.800
Max. : 7.900 Max. : 4.400 Max. : 6.900 Max. : 2.500
Species
setosa : 50
versicolor: 50
virginica : 50
I want to perform a linear regression where Petal.Length is the dependent variable, and Sepal.Length is the independent variable. How can I, in R, perform this regression for each Species category at once, getting values of P, R² and F for each test?
>Solution :
Use by.
by(iris, iris$Species, \(x) summary(lm(Petal.Length ~ Sepal.Length, x)))
# iris$Species: setosa
#
# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.40856 -0.08027 -0.00856 0.11708 0.46512
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.80305 0.34388 2.335 0.0238 *
# Sepal.Length 0.13163 0.06853 1.921 0.0607 .
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.1691 on 48 degrees of freedom
# Multiple R-squared: 0.07138, Adjusted R-squared: 0.05204
# F-statistic: 3.69 on 1 and 48 DF, p-value: 0.0607
#
# ---------------------------------------------------------
# iris$Species: versicolor
#
# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.68611 -0.22827 -0.04123 0.19458 0.79607
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.18512 0.51421 0.360 0.72
# Sepal.Length 0.68647 0.08631 7.954 2.59e-10 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.3118 on 48 degrees of freedom
# Multiple R-squared: 0.5686, Adjusted R-squared: 0.5596
# F-statistic: 63.26 on 1 and 48 DF, p-value: 2.586e-10
#
# ---------------------------------------------------------
# iris$Species: virginica
#
# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.68603 -0.21104 0.06399 0.18901 0.66402
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.61047 0.41711 1.464 0.15
# Sepal.Length 0.75008 0.06303 11.901 6.3e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2805 on 48 degrees of freedom
# Multiple R-squared: 0.7469, Adjusted R-squared: 0.7416
# F-statistic: 141.6 on 1 and 48 DF, p-value: 6.298e-16