Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove columns that contain all the same value

I have count data (columns) in the form of presence/absence (1/0) of various genes in different samples that belong to one of two categories. I am doing a Fisher’s (fisher.test) for each gene, but I get an error whenever that gene is present (1) or absent (0) from all samples. How can I remove or skip these columns, or have the command fisher.test ignore or skip these genes and keep going?

Here is my sample data:

mydata <- data.frame(sampleID = c("A", "B", "C", "D", "E", "F", "G"),
                     category = c("high", "low", "high", "high", "low", "high", "low"),
                     Gene1 = c(1, 1, 0, 0, 0, 1, 1),
                     Gene2 = c(0, 1, 1, 1, 1, 1, 0),
                     Gene3 = c(0, 0, 0, 1, 1, 1, 1),
                     Gene4 = c(1, 1, 1, 1, 1, 1, 1)

Here is the loop code that someone helped me design, which applies the fisher.test to each gene:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(dplyr)
library(tidyr)
library(broom)

mydata %>%
  select(-sampleID) %>%
  pivot_longer(cols = -category, names_to = "gene") %>%
  group_by(gene) %>%
  summarise(fisher_test = list(tidy(fisher.test(table(category, value))))) %>%
  unnest(fisher_test) %>%
  mutate(odds_ratio = exp(estimate)) %>% 
  select(-method, -alternative)

The error message I get when it encounters a gene that is present or absent from all samples:

Caused by error in `fisher.test()`:
! 'x' must have at least 2 rows and columns
Run `rlang::last_error()` to see where the error occurred.

Where can I insert this step into the loop above?

Note: It is not feasible to omit the genes manually, as there are hundreds of them.

>Solution :

We could use

library(dplyr)
library(tidyr)
mydata %>% 
   select(!where(~ is.numeric(.x) && n_distinct(.x) == 1),-sampleID) %>%
 
  pivot_longer(cols = -category, names_to = "gene") %>%
  group_by(gene) %>%
  summarise(fisher_test = list(tidy(fisher.test(table(category, value))))) %>%
  unnest(fisher_test) %>%
  mutate(odds_ratio = exp(estimate)) %>% 
  select(-method, -alternative)

-output

# A tibble: 3 × 6
  gene  estimate p.value conf.low conf.high odds_ratio
  <chr>    <dbl>   <dbl>    <dbl>     <dbl>      <dbl>
1 Gene1    1.81        1  0.0469      176.        6.11
2 Gene2    0.707       1  0.00640      78.2       2.03
3 Gene3    1.81        1  0.0469      176.        6.11
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading