Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to find max and min values regardless of being positive or negative using R

I have a data frame that look like:

Genes   intA    Chr_intA    Chr_intB    direction_1 direction_2 distance
GeneA   P53 chr19   chr8    -   -   -423
GeneA   P53 chr19   chr8    -   -   -3467567
GeneA   P53 chr19   chr8    -   -   10452
GeneB   P53 chr19   chr8    -   -   -2884
GeneB   P53 chr19   chr8    -   -   -40

I want to group by columns Genes and intA`` and then only get rows with the maximum or minimum values (regardless of being positive or negative) in the last column called distance“`.

The desired output for getting maximum distance values will be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Genes   intA    Chr_intA    Chr_intB    direction_1 direction_2 distance
GeneA   P53 chr19   chr8    -   -   -3467567
GeneB   P53 chr19   chr8    -   -   -2884

And the desired output for getting minimum distance values will be:

Genes   intA    Chr_intA    Chr_intB    direction_1 direction_2 distance
GeneA   P53 chr19   chr8    -   -   -423
GeneB   P53 chr19   chr8    -   -   -40

I tried the methods below but the problem is that it changes negative values to positive values as well as the shape of the final output. How can I solve these two minor things? Thanks.

library(dplyr)
df <- df %>% group_by(Genes, intA) %>% summarise(distance = max(abs(distance)))
df <- df %>% group_by(Genes, intA) %>% summarise(distance = min(abs(distance)))

>Solution :

You can use slice_min and slice_max to get the highest or lowest n (default is 1) rows by group. Since you are looking at distance, you should use abs to get the absolute value of the distance.

dat %>% 
  group_by(Genes, intA) %>%
  slice_max(abs(distance))

#  Genes intA  Chr_intA Chr_intB direction_1 direction_2 distance
#  <chr> <chr> <chr>    <chr>    <chr>       <chr>          <int>
#1 GeneA P53   chr19    chr8     -           -           -3467567
#2 GeneB P53   chr19    chr8     -           -              -2884
  
dat %>% 
  group_by(Genes, intA) %>%
  slice_min(abs(distance))

#  Genes intA  Chr_intA Chr_intB direction_1 direction_2 distance
#  <chr> <chr> <chr>    <chr>    <chr>       <chr>          <int>
#1 GeneA P53   chr19    chr8     -           -               -423
#2 GeneB P53   chr19    chr8     -           -                -40
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading