Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

adding a new column to a datframe for 3 condition cases

I have a dataframe like this:

geneID  baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
ENSG00000000003.14  2700.791337 -0.345466785    0.202389477 -1.706940451    0.087833121 0.001
ENSG00000000419.12  1571.143316 -0.348258736    0.150807514 -2.309293001    0.020927328 0.120478416
ENSG00000000457.13  526.2282051 -0.051250213    0.180482116 -0.283962835    0.776438862 0.003
ENSG00000000460.16  1108.138705 -0.078538637    0.167859597 -0.467882913    0.639868323 0.827329552
ENSG00000001036.13  2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
ENSG00000001084.10  1325.447272 0.89    0.154875429 -0.423289781    0.672083849 0.0004
ENSG00000001167.14  1829.828657 -0.221749678    0.153100403 -1.448393819    0.147506943 0.386446872
ENSG00000001460.17  641.7582879 -0.252419377    0.183602552 -1.374814095    0.169189087 0.417816879

I want to add a column named threshold such that if

df$log2FoldChange > 0 & df$padj < 0.05 this should be labeled up
df$log2FoldChange < 0 & df$padj < 0.05 this should be labeled down
and anything else as NS

So for the above table, output should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

geneID  baseMean    log2FoldChange  lfcSE   stat    pvalue  padj    threshold
ENSG00000000003.14  2700.791337 -0.345466785    0.202389477 -1.706940451    0.087833121 0.001   down
ENSG00000000419.12  1571.143316 -0.348258736    0.150807514 -2.309293001    0.020927328 0.120478416 NS
ENSG00000000457.13  526.2282051 -0.051250213    0.180482116 -0.283962835    0.776438862 0.003   down
ENSG00000000460.16  1108.138705 -0.078538637    0.167859597 -0.467882913    0.639868323 0.827329552 NS
ENSG00000001036.13  2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774 NS
ENSG00000001084.10  1325.447272 0.89    0.154875429 -0.423289781    0.672083849 0.0004  up
ENSG00000001167.14  1829.828657 -0.221749678    0.153100403 -1.448393819    0.147506943 0.386446872 NS
ENSG00000001460.17  641.7582879 -0.252419377    0.183602552 -1.374814095    0.169189087 0.417816879 NS

I tried this but of course it is not doing what I want:

dat <- mutate(dat,threshold=if_else(dat$padj <= 0.05 & dat$log2FoldChange > 0,"up","NS"))
dat <- mutate(dat,threshold=if_else(dat$padj <= 0.05 & dat$log2FoldChange < 0,"down","NS"))

>Solution :

One option is to use case_when() from the dplyr package to do both "up" and "down" (or else "NS") in one step, e.g.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- read.table(text = "geneID  baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
ENSG00000000003.14  2700.791337 -0.345466785    0.202389477 -1.706940451    0.087833121 0.001
ENSG00000000419.12  1571.143316 -0.348258736    0.150807514 -2.309293001    0.020927328 0.120478416
ENSG00000000457.13  526.2282051 -0.051250213    0.180482116 -0.283962835    0.776438862 0.003
ENSG00000000460.16  1108.138705 -0.078538637    0.167859597 -0.467882913    0.639868323 0.827329552
ENSG00000001036.13  2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
ENSG00000001084.10  1325.447272 0.89    0.154875429 -0.423289781    0.672083849 0.0004
ENSG00000001167.14  1829.828657 -0.221749678    0.153100403 -1.448393819    0.147506943 0.386446872
ENSG00000001460.17  641.7582879 -0.252419377    0.183602552 -1.374814095    0.169189087 0.417816879",
header = TRUE)

dat <- mutate(df,threshold = case_when(df$padj <= 0.05 & df$log2FoldChange > 0 ~ "up",
                                       df$padj <= 0.05 & df$log2FoldChange < 0 ~ "down",
                                       TRUE ~ "NS"))
dat
#>               geneID  baseMean log2FoldChange     lfcSE       stat     pvalue
#> 1 ENSG00000000003.14 2700.7913    -0.34546678 0.2023895 -1.7069405 0.08783312
#> 2 ENSG00000000419.12 1571.1433    -0.34825874 0.1508075 -2.3092930 0.02092733
#> 3 ENSG00000000457.13  526.2282    -0.05125021 0.1804821 -0.2839628 0.77643886
#> 4 ENSG00000000460.16 1108.1387    -0.07853864 0.1678596 -0.4678829 0.63986832
#> 5 ENSG00000001036.13 2662.1320     0.12141941 0.1752099  0.6929940 0.48831330
#> 6 ENSG00000001084.10 1325.4473     0.89000000 0.1548754 -0.4232898 0.67208385
#> 7 ENSG00000001167.14 1829.8287    -0.22174968 0.1531004 -1.4483938 0.14750694
#> 8 ENSG00000001460.17  641.7583    -0.25241938 0.1836026 -1.3748141 0.16918909
#>        padj threshold
#> 1 0.0010000      down
#> 2 0.1204784        NS
#> 3 0.0030000      down
#> 4 0.8273296        NS
#> 5 0.7288428        NS
#> 6 0.0004000        up
#> 7 0.3864469        NS
#> 8 0.4178169        NS

Created on 2023-03-07 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading