I have a data frame with the following structure:
pos<- c(67,125,158,195,235,458,499,526,785,912,999,1525)
v_1<-c("j","c","v","r","s","q","r","r","s","t","u","v")
v_2<-c("c","t","v","r","s","q","r","w","c","c","o","v")
v_3<-c("z","c","v","r","s","q","r","w","c","b","p","v")
v_4<-c("x","w","z","z","s","q","r","w","c","o","t","v")
data<-as.data.frame(cbind(pos,v_1,v_2,v_3,v_4))
In this dataframe it is possible to find the same letters among the different columns in consecutive rows. I need to obtain a separate data frame with the values of the variable "pos" for these shared letters in continuos rows, as can bee seen in the figure:

In this figure even all the columsn have the same letter in pos 1525 Is not importat since is not in a consecutive row. What I need to get is the dataframe in the right.
>Solution :
Solution using tidyr and dplyr:
- After pivoting to long, use
dplyr::add_count()to find repeated values within eachpos; - Within each
v, find consecutive rows with repeated values, defined as: >1 repeat and >1 repeat in either preceding or following row; - Create a column containing
posfor consecutive rows andNAotherwise; - Take the minimum and maximum to get
startandendfor eachv.
library(tidyr)
library(dplyr)
data %>%
pivot_longer(!pos, names_to = "v") %>%
add_count(pos, value) %>%
group_by(v) %>%
mutate(consec = ifelse(
n > 1 & (lag(n) > 1 | lead(n) > 1),
pos,
NA
)) %>%
summarize(
start = min(consec, na.rm = TRUE),
end = max(consec, na.rm = TRUE)
)
# A tibble: 4 × 3
v start end
<chr> <chr> <chr>
1 v_1 125 499
2 v_2 158 785
3 v_3 125 785
4 v_4 235 785
Note, not sure if/how you want to handle if there is more than one set of consecutive rows, so this solution doesn’t address that.