Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Running ANOVAs across columns within a dataframe

I have the following data frame:

df<- structure(list(Group.1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L), BLC = c(10.9890294366989, 7.31930466605672, 
13.6185172644819, 2.7266530334015, 3.53565114908662, 7.20597804412166, 
2.78164116828929, 7.59371098030222, 14.7343839844163, 2.9806259314456, 
5.07619453154234, 6.88503820786366, 12.2882487654356, 1.40646976090014, 
19.307342679726, 15.0249870253821, 4.34581364475618, 5.03491248395278, 
7.94957003082448, 6.84343434343434, 11.2622383086214, 11.1839711729262, 
4.7669094503789, 3.09762397594833, 8.10311438552811, 0, 0, 0, 
0, 0), BLG = c(53.2196490874651, 23.9543988057977, 46.2826752583327, 
34.9096813849679, 27.0376341826749, 49.2472186166963, 93.0631982759938, 
46.1366527251764, 57.6460095990237, 36.5835422650789, 56.2627854701592, 
30.1133129448127, 22.2997436361558, 28.9793549481134, 37.6201098690056, 
59.8627031558285, 34.7171109184231, 48.9414623325316, 31.5061417556072, 
21.2521546878513, 70.8263794749462, 24.930952093699, 39.307162693975, 
28.9144148451338, 42.9157121339545, 3.94736842105263, 3.94736842105263, 
3.94736842105263, 3.94736842105263, 3.94736842105263), LMB = c(75.2718187185061, 
42.707035200077, 31.37371428004, 24.9294274297168, 21.619318105277, 
19.8056309622509, 62.5665072062847, 30.2395840472535, 36.2246969501391, 
16.053874321678, 73.325176836826, 32.1599744373439, 33.8619234899393, 
39.1278597999347, 29.242123346214, 50.3372863653836, 23.3365756853847, 
61.7018803213189, 18.2047745554517, 40.1231815267265, 36.2849916132823, 
35.4393881210482, 41.6277079218274, 27.5840809362335, 14.5766262544513, 
19.7368421052632, 19.7368421052632, 19.7368421052632, 19.7368421052632, 
19.7368421052632), RSF = c(7.7061355134565, 6.57544180671257, 
21.6485001821173, 14.3568910671585, 3.53565114908662, 2.89876366994815, 
10.1680661480383, 17.3890884998598, 6.45810108311722, 2.95766439639045, 
13.7591229373968, 21.1086837581149, 3.65965233302836, 26.151881306845, 
9.17122695497959, 16.6545469585419, 8.26685329264933, 9.3745854643381, 
1.4903129657228, 23.6018678125026, 7.04954072403232, 11.1546894959865, 
20.5987856222152, 8.10190710702138, 4.41849566570698, 3.94736842105263, 
3.94736842105263, 3.94736842105263, 3.94736842105263, 3.94736842105263
), GSF = c(0, 0, 0, 2.51341949455157, 8.34660636193077, 3.23362974939369, 
2.85602204934611, 3.23362974939369, 3.63636363636364, 3.9344262295082, 
0, 0, 3.9344262295082, 0, 1.46520146520147, 0, 0, 0, 3.9344262295082, 
3.63636363636364, 1.46342316809685, 0.879120879120879, 0, 0, 
2.36065573770492, 0, 0, 0, 0, 0), CCF = c(0, 0, 0, 3.14465408805032, 
0, 0, 0, 0, 0, 0, 0, 0, 1.24223602484472, 0, 0, 0, 0, 0, 1.24223602484472, 
0, 1.88679245283019, 0, 0, 0, 0, 0, 0, 0, 0, 0), design = c("random", 
"random", "random", "random", "random", "random", "random", "random", 
"random", "random", "strat", "strat", "strat", "strat", "strat", 
"strat", "strat", "strat", "strat", "strat", "hybrid", "hybrid", 
"hybrid", "hybrid", "hybrid", "hybrid", "hybrid", "hybrid", "hybrid", 
"hybrid")), row.names = c(NA, -30L), class = "data.frame")

I need to run an ANOVA for each column grouped by the design variable. So for example, I’m looking to do an ANOVA for species BLG, testing for a difference in the values between each design. I can do that for a single species via filtering, but I need to do the same thing for every other column as well, and many more data frames of similar format. Then doing the same thing but for a post-hoc test where there are differences found between designs for each species.

I’m guessing there is a way to do this with something like map() or lapply(). My original thought was to make a model for each column such as model<- lm(BLG ~ design, data=df), and use this format in the map function to do this for each other column, then proceed with a similar method for the ANOVA test, but I am getting stuck (basically at the very beginning).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is where I am currently at:

test<- df %>% 
  names() %>% 
  paste('design ~', .) %>% 
  map(~lm(as.formula(.x), data=df))

Resulting in the following error:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  NA/NaN/Inf in 'y'
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion

My guess is that the issue is something similar to example 2 from this link, but I’m not sure what would go there if not the "design" column.

Any help or resources anyone has in mind that may be useful would be greatly appreciated.

Thanks for reading.

>Solution :

The design is character class. Should the formula be reversed? Based on the ?lm documentation

Models for lm are specified symbolically. A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response

purrr::map(names(df)[2:7], ~ lm(reformulate('design', response = .x), data = df))

-output

[[1]]

Call:
lm(formula = reformulate("design", response = .x), data = df)

Coefficients:
 (Intercept)  designrandom   designstrat  
       3.841         3.507         4.575  


[[2]]

Call:
lm(formula = reformulate("design", response = .x), data = df)

Coefficients:
 (Intercept)  designrandom   designstrat  
       22.66         24.14         14.49  


[[3]]

Call:
lm(formula = reformulate("design", response = .x), data = df)

Coefficients:
 (Intercept)  designrandom   designstrat  
       25.42         10.66         14.72  


[[4]]

Call:
lm(formula = reformulate("design", response = .x), data = df)

Coefficients:
 (Intercept)  designrandom   designstrat  
       7.106         2.263         6.218  


[[5]]

Call:
lm(formula = reformulate("design", response = .x), data = df)

Coefficients:
 (Intercept)  designrandom   designstrat  
      0.4703        2.3051        0.8267  


[[6]]

Call:
lm(formula = reformulate("design", response = .x), data = df)

Coefficients:
 (Intercept)  designrandom   designstrat  
     0.18868       0.12579       0.05977  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading