Say we have the following indicator vector:
library(dplyr)
tibble(row = 1:20,
indicator = rep(c(rep(0, 5), 1, rep(0, 4)), 2))
row indicator
<int> <dbl>
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 1
7 7 0
8 8 0
9 9 0
10 10 0
11 11 0
12 12 0
13 13 0
14 14 0
15 15 0
16 16 1
17 17 0
18 18 0
19 19 0
20 20 0
How can I easily create a column that indicates a region around the indicator column. For example, if I want to create three “regions” of size N = 1, 3, and 5, then the desired output should look like:
row indicator region_n1 region_n3 region_n5
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 0 0 0 1
5 5 0 0 1 1
6 6 1 1 1 1
7 7 0 0 1 1
8 8 0 0 0 1
9 9 0 0 0 0
10 10 0 0 0 0
11 11 0 0 0 0
12 12 0 0 0 0
13 13 0 0 0 0
14 14 0 0 0 1
15 15 0 0 1 1
16 16 1 1 1 1
17 17 0 0 1 1
18 18 0 0 0 1
19 19 0 0 0 0
20 20 0 0 0 0
I can code this up when there is only one “1” in the indicator variable by sorting, but struggle when there are multiple “1s.” Any help is greatly appreciated, thanks.
>Solution :
Using User-defined function with lag
and lead
:
get_region_n <- function(x,n){
if(n==1){
return(x)
}else{
new_n <- (n-1)/2
new_x <- x
for(i in new_n:1){
new_x <- new_x+lag(x,n=i,default=0)+lead(x,n=i,default=0)
}
return(new_x)
}
}
df%>%mutate(region_n1=get_region_n(indicator,1),
region_n3=get_region_n(indicator,3),
region_n5=get_region_n(indicator,5))
row indicator region_n1 region_n3 region_n5
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 0 0 0 1
5 5 0 0 1 1
6 6 1 1 1 1
7 7 0 0 1 1
8 8 0 0 0 1
9 9 0 0 0 0
10 10 0 0 0 0
11 11 0 0 0 0
12 12 0 0 0 0
13 13 0 0 0 0
14 14 0 0 0 1
15 15 0 0 1 1
16 16 1 1 1 1
17 17 0 0 1 1
18 18 0 0 0 1
19 19 0 0 0 0
20 20 0 0 0 0