Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add sample size to data frame after aggregating using R

I have a data frame with plot plot numbers, and independently-taken data for 4 test subjects as shown below:

data <- data.frame(plot=c(101,
                          101,
                          101,
                          101,
                          101,
                          101,
                          101,
                          101,
                          102,
                          102,
                          102,
                          102,
                          102,
                          102,
                          102,
                          102),
                          subject1 = c(3,
                                       4,
                                       2,
                                       3,
                                       6,
                                       5,
                                       4,
                                       2,
                                       3,
                                       6,
                                       2,
                                       2,
                                       3,
                                       2,
                                       5,
                                       2),
                          subject2 = c(2,
                                       3,
                                       2,
                                       1,
                                       5,
                                       2,
                                       23,
                                       2,
                                       5,
                                       2,
                                       3,
                                       2,
                                       1,
                                       2,
                                       5,
                                       4),
                          subject3 = c(3,
                                       2,
                                       1,
                                       2,
                                       52,
                                       5,
                                       2,
                                       2,
                                       5,
                                       2,
                                       2,
                                       3,
                                       2,
                                       2,
                                       2,
                                       2),
                          subject4 = c(2,
                                       2,
                                       2,
                                       2,
                                       23,
                                       3,
                                       2,
                                       21,
                                       5,
                                       5,
                                       3,
                                       2,
                                       1,
                                       4,
                                       2,
                                       3))

My next task is to aggregate the data to find the mean score of each subject within each plots, so I did the following:

library(dplyr)
library(tibble)

#Aggregate by mean
mean <- aggregate(data, by=list(data$plot), mean)

#Select unwanted columns
mean <- select(mean, -Group.1)

#Add new column for the next part of the question
mean <- mean%>%
  add_column(sample_size = "sample_size")

What I need to do is to create a column with the sum of the total sample sizes for each plot. For instance, the number of occurrences of "101" in this dataset is 8, so I need that value listed at the end of my aggregated data frame. It would look like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

mean_data <- data.frame(plot=c(101, 102),
                        subject1=c(3.625, 3.125),
                        subject2=c(5, 3),
                        subject3=c(8.625, 2.5),
                        subject4=c(7.125, 3.125),
                        sample_size=c(8, 8))

How can I do this?

>Solution :

With across, in summarise, we can have multiple function in a flexible way after grouping by ‘plot’

library(dplyr)
data %>% 
 group_by(plot) %>% 
 summarise(across(everything(), mean), sample_size = n())

-output

# A tibble: 2 × 6
   plot subject1 subject2 subject3 subject4 sample_size
  <dbl>    <dbl>    <dbl>    <dbl>    <dbl>       <int>
1   101     3.62        5     8.62     7.12           8
2   102     3.12        3     2.5      3.12           8
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading