Split dataframe into multiple dataframes by grouping columns in R

December 21, 2023

I have a dataframe of expression data where gene are rows and columns are samples. I also have a dataframe containing metadata for each sample in the expression dataframe. In reality my expr dataframe has 30,000+ rows and 100+ columns. However, below is an example with smaller data.

expr <- data.frame(sample1 = c(1,2,2,0,0), 
                   sample2 = c(5,2,4,4,0), 
                   sample3 = c(1,2,1,0,1), 
                   sample4 = c(6,5,6,6,7), 
                   sample5 = c(0,0,0,1,1))
rownames(expr) <- paste0("gene",1:5)
meta <- data.frame(sample = paste0("sample",1:5),
                   treatment = c("control","control",
                                 "treatment1", 
                                 "treatment2", "treatment2"))

I want to find the mean for each gene per treatment. From the examples I’ve seen with split() or group_by() people group based on a column already present in the data.frame. However, I have a separate dataframe (meta) that classifies the grouping for the columns in another dataframe (expr).

I would like my output to be a dataframe with genes as rows, treatment as columns, and values as the mean.

#        control   treatment1   treatment2
#  gene1  mean        mean         mean
#  gene2  mean        mean         mean

>Solution :

Something like this. It’s not entirely clear what you want to group by in the last step, but you can adjust that easily.

library(dplyr)
library(tidyr)

expr |>
  mutate(gene = row.names(expr)) |>
  pivot_longer(-gene, names_to = "sample") |>
  left_join(meta, by = "sample") |>
  summarize(mean = mean(value), .by = c(gene, treatment)) |> 
  pivot_wider(names_from = treatment, values_from = mean)
# # A tibble: 5 × 4
#   gene  control treatment1 treatment2
#   <chr>   <dbl>      <dbl>      <dbl>
# 1 gene1       3          1        3  
# 2 gene2       2          2        2.5
# 3 gene3       3          1        3  
# 4 gene4       2          0        3.5
# 5 gene5       0          1        4