Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find matching elements based on multiple arguments in R

I have a large data frame that looks like this. I want to find which genes match the others based on an overlap between the start and end positions.

library(tidyverse)

data <- data.frame(group=c(1,1,1,2,2,2),
                     genes=c("A","B","C","D","E","F"), 
                     start=c(1000,2000,3000,800,400,2000),
                     end=c(1500,2500,3500,1200,500,10000))

data
#>   group genes start   end
#> 1     1     A  1000  1500
#> 2     1     B  2000  2500
#> 3     1     C  3000  3500
#> 4     2     D   800  1200
#> 5     2     E   400   500
#> 6     2     F  2000 10000

Created on 2022-12-05 with reprex v2.0.2

I want something like this.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data
#>   group genes start   end   match
#> 1     1     A  1000  1500    A-D
#> 2     1     B  2000  2500    B-F
#> 3     1     C  3000  3500    C-F
#> 4     2     D   800  1200    A-D
#> 5     2     E   400   500    NA
#> 6     2     F  2000 10000    F-C-B

I am a bit lost on how to start.
Any help is appreciated

>Solution :

With devel version of dplyr, we can use

library(dplyr)
library(stringr)
by <- join_by(overlaps(x$start, x$end, y$start, y$end))
full_join(data, data, by) %>% 
  group_by(genes= genes.x) %>% 
  summarise(match = if(n() ==1) NA_character_ else 
      str_c(genes.y, collapse = '-')) %>%
 left_join(data, .)

-output

  group genes start   end match
1     1     A  1000  1500   A-D
2     1     B  2000  2500   B-F
3     1     C  3000  3500   C-F
4     2     D   800  1200   A-D
5     2     E   400   500  <NA>
6     2     F  2000 10000 B-C-F
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading