Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to summarize several independent variables at once in R?

For example, if the data is like below,

Cultivar=rep(c("CV1","CV2"),each=12)
Nitrogen=rep(rep(c("N0","N1","N2","N3"), each=3),2)
Block=rep(c("I","II","III"),8)
Yield=c(99,109,89,115,142,133,121,157,142,125,150,139,82,104,99,117,
        125,127,145,154,154,151,166,175)
Protein=c(25,35,45,55,44,33,21,57,42,25,50,39,72,14,79,71,25,27,45,54,47,51,66,75)
dataA=data.frame(Cultivar,Nitrogen,Block,Yield,Protein)

I’d like to summarize yield and protein data. So I used the below code.

library (plyr)
dataB=ddply(dataA, c("Cultivar","Nitrogen"), summarise, mean=mean(Yield), 
            sd=sd(Yield), n=length(Yield), se=sd/sqrt(n))
dataC=ddply(dataA, c("Cultivar","Nitrogen"), summarise, mean=mean(Protein), 
            sd=sd(Protein), n=length(Protein), se=sd/sqrt(n))
dataB$Protein=dataC$mean
dataB$Protein_se=dataC$se
dataB

  Cultivar Nitrogen mean        sd n        se  Protein Protein_se
1      CV1       N0   99 10.000000 3  5.773503 35.00000   5.773503
2      CV1       N1  130 13.747727 3  7.937254 44.00000   6.350853
3      CV1       N2  140 18.083141 3 10.440307 40.00000  10.440307
4      CV1       N3  138 12.529964 3  7.234178 38.00000   7.234178
5      CV2       N0   95 11.532563 3  6.658328 55.00000  20.599353
6      CV2       N1  123  5.291503 3  3.055050 41.00000  15.011107
7      CV2       N2  151  5.196152 3  3.000000 48.66667   2.728451
8      CV2       N3  164 12.124356 3  7.000000 64.00000   7.000000

But I believe there are much simple codes to summarize several independent variables at once.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Could you let me know how to do that?

Many thanks,

>Solution :

You could use dplyr::summarize across the desired columns and specify the groups using .by and put all the summary statistics you want in a list:

library(dplyr)

dataA %>%
  summarize(across(Yield:Protein, 
                   .fns = list(Mean = mean, 
                               SD = sd, 
                               n = length,
                               se = ~ sd(.x)/sqrt(length(.x)))), 
            .by = c("Cultivar", "Nitrogen"))

Output:

 Cultivar Nitrogen Yield_Mean  Yield_SD Yield_n  Yield_se Protein_Mean Protein_SD Protein_n Protein_se
1      CV1       N0         99 10.000000       3  5.773503     35.00000  10.000000         3   5.773503
2      CV1       N1        130 13.747727       3  7.937254     44.00000  11.000000         3   6.350853
3      CV1       N2        140 18.083141       3 10.440307     40.00000  18.083141         3  10.440307
4      CV1       N3        138 12.529964       3  7.234178     38.00000  12.529964         3   7.234178
5      CV2       N0         95 11.532563       3  6.658328     55.00000  35.679126         3  20.599353
6      CV2       N1        123  5.291503       3  3.055050     41.00000  26.000000         3  15.011107
7      CV2       N2        151  5.196152       3  3.000000     48.66667   4.725816         3   2.728451
8      CV2       N3        164 12.124356       3  7.000000     64.00000  12.124356         3   7.000000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading