Determine regions around elements in a vector in R

Say we have the following indicator vector:

library(dplyr)
tibble(row = 1:20,
       indicator = rep(c(rep(0, 5), 1, rep(0, 4)), 2))

     row indicator
   <int>     <dbl>
 1     1         0
 2     2         0
 3     3         0
 4     4         0
 5     5         0
 6     6         1
 7     7         0
 8     8         0
 9     9         0
10    10         0
11    11         0
12    12         0
13    13         0
14    14         0
15    15         0
16    16         1
17    17         0
18    18         0
19    19         0
20    20         0


How can I easily create a column that indicates a region around the indicator column. For example, if I want to create three “regions” of size N = 1, 3, and 5, then the desired output should look like:

     row indicator region_n1 region_n3 region_n5
   <int>     <dbl>     <dbl>     <dbl>     <dbl>
 1     1         0         0         0         0
 2     2         0         0         0         0
 3     3         0         0         0         0
 4     4         0         0         0         1
 5     5         0         0         1         1
 6     6         1         1         1         1
 7     7         0         0         1         1
 8     8         0         0         0         1
 9     9         0         0         0         0
10    10         0         0         0         0
11    11         0         0         0         0
12    12         0         0         0         0
13    13         0         0         0         0
14    14         0         0         0         1
15    15         0         0         1         1
16    16         1         1         1         1
17    17         0         0         1         1
18    18         0         0         0         1
19    19         0         0         0         0
20    20         0         0         0         0

I can code this up when there is only one “1” in the indicator variable by sorting, but struggle when there are multiple “1s.” Any help is greatly appreciated, thanks.

>Solution :

Using User-defined function with lag and lead:

get_region_n <- function(x,n){
  if(n==1){
    return(x)
  }else{
    new_n <- (n-1)/2
    new_x <- x
    for(i in new_n:1){
      new_x <- new_x+lag(x,n=i,default=0)+lead(x,n=i,default=0)
    }
    return(new_x)
  }
}

df%>%mutate(region_n1=get_region_n(indicator,1),
            region_n3=get_region_n(indicator,3),
            region_n5=get_region_n(indicator,5))

     row indicator region_n1 region_n3 region_n5
   <int>     <dbl>     <dbl>     <dbl>     <dbl>
 1     1         0         0         0         0
 2     2         0         0         0         0
 3     3         0         0         0         0
 4     4         0         0         0         1
 5     5         0         0         1         1
 6     6         1         1         1         1
 7     7         0         0         1         1
 8     8         0         0         0         1
 9     9         0         0         0         0
10    10         0         0         0         0
11    11         0         0         0         0
12    12         0         0         0         0
13    13         0         0         0         0
14    14         0         0         0         1
15    15         0         0         1         1
16    16         1         1         1         1
17    17         0         0         1         1
18    18         0         0         0         1
19    19         0         0         0         0
20    20         0         0         0         0

Leave a Reply