Let’s say I have a list of positions values :
> head(jap["POS"])
POS
1 836924
2 922009
3 1036959
4 141607615
5 164000000
6 118528028
[...]
And a list of intervals :
> genes_of_interest
MGAM SI TREH SLC2A2 SLC2A5 SLC5A1 TAS1R3 LCT
1 141607613 164696686 118528026 170714137 9095166 32439248 1266660 136545420
2 141806547 164796284 118550359 170744539 9148537 32509016 1270694 136594754
I want to check for every position in the first dataframe, if it is inside any of the intervals in the second dataframe.
So in this case, I should have
FALSE FALSE FALSE TRUE FALSE TRUE
Since 141607615 belongs to first interval (MGAM) and 118528028 belongs to 3rd interval (TREH).
Do you have any idea how to do this ?
Thanks by advance.
>Solution :
We can use sapply to go through all columns in genes_of_interest and compare the position shown in jap with the intervals. Then wrap it with another apply to determine if any of the rows is TRUE. Or we can replace the outer apply with as.logical(rowSums()), the outputs for both functions are the same.
Note the between function comes from the dplyr package.
library(dplyr)
apply(sapply(1:ncol(genes_of_interest), \(x) between(jap$POS, genes_of_interest[1, x], genes_of_interest[2, x])), 1, any)
# or
as.logical(rowSums(sapply(1:ncol(genes_of_interest), \(x) between(jap$POS, genes_of_interest[1, x], genes_of_interest[2, x]))))
Output
[1] FALSE FALSE FALSE TRUE FALSE TRUE