Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new data.frame consisting of average value of every two rows

I have data frame from an experiment similar to the following and would like to reduce the number of rows by half by collapsing every two rows of the data according to the "Attribute" value. Ultimately, for each participant, I want to create a new value for each of the five Rating columns that represents the average of each rating for each Attribute.

For example, for Participant_ID = 1, rows 1 and 2 would be collapsed because they both have the Attribute X, rows 3 and 4 would be collapsed because they both have the Attribute Y, rows 5 and 6 would be collapsed because they both have the Attribute Z, and so on for the other Participants.

# Example data
Participant_ID <- c(rep(1, 6), rep(2, 6), rep(3, 6))
Gender <- c(rep("M", 6), rep("F", 6), rep("M", 6))
Group <- c(rep("A", 6), rep("B", 6), rep("C", 6))
Attribute <- rep(c(rep("X", 2), rep("Y", 2), rep("Z", 2)), 3)
Rating_1 <- sample(1:5, 18, replace = TRUE)
Rating_2 <- sample(1:5, 18, replace = TRUE)
Rating_3 <- sample(1:5, 18, replace = TRUE)
Rating_4 <- sample(1:5, 18, replace = TRUE)
Rating_5 <- sample(1:5, 18, replace = TRUE)

dat <- data.frame(factor(Participant_ID), factor(Gender), factor(Group), 
factor(Attribute), Rating_1, Rating_2, Rating_3, Rating_4, Rating_5)

Can someone recommend a method for doing this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We group by the factor columns, and create a run-length-id on the ‘Attribute.’ column so that its index increase at a different Attribute and then summarise across the integer columns to return the mean

library(dplyr)
library(data.table)
dat %>%
   group_by(across(where(is.factor)), grp = rleid(factor.Attribute.)) %>% 
  summarise(across(where(is.integer), mean), .groups = 'drop')

-output

# A tibble: 9 × 10
  factor.Participant_ID. factor.Gender. factor.Group. factor.Attribute.   grp Rating_1 Rating_2 Rating_3 Rating_4 Rating_5
  <fct>                  <fct>          <fct>         <fct>             <int>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 1                      M              A             X                     1      5        4        1.5      2.5      1  
2 1                      M              A             Y                     2      3.5      1.5      2        4.5      3  
3 1                      M              A             Z                     3      2        4        2.5      2        2  
4 2                      F              B             X                     4      2        2.5      1.5      2.5      2.5
5 2                      F              B             Y                     5      1.5      5        4        2        3.5
6 2                      F              B             Z                     6      2        1.5      2        1        2.5
7 3                      M              C             X                     7      3        2        3.5      1.5      3  
8 3                      M              C             Y                     8      3        4        3.5      1.5      3.5
9 3                      M              C             Z                     9      4.5      4.5      4.5      2.5      5  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading