Counting consecutive rowwise duplicates

April 13, 2022

I am currently working with data which has a structure like the following

ID	Var1	Var2	Var3	Var4	Var5	Var6
1	1001	1001	1001	2002	2002	2002
2	2002	1001	1001	1001	NA	9999
3	1001	NA	1001	1001	9999	1234
4	3003	3003	3003	NA	2002	2002
5	NA	NA	NA	1001	1001	1001
6	3003	3003	3003	3003	3003	3003

df <- data.frame(ID = c(1, 2, 3, 4, 5, 6),
                 Var1 = c(1001, 2002, 1001, 3003,   NA, 3003),
                 Var2 = c(1001, 1001,   NA, 3003,   NA, 3003),
                 Var3 = c(1001, 1001, 1001, 3003,   NA, 3003),
                 Var4 = c(2002, 1001, 1001,   NA, 1001, 3003),
                 Var5 = c(2002,   NA, 9999, 2002, 1001, 3003),
                 Var6 = c(2002, 9999, 1234, 2002, 1001, 3003))

My goal is to count—for each row—how many times a unique value has been duplicated consecutively at least n times. Let’s say n = 3. So, if "1001" repeats rowwise in at least 3 adjacent cells (could be more), then that counts as one duplicate. If both "1001" and "2002" have been repeated in a row at least three times each consecutively, then that would be two duplicates, etc. The following would be the intended result:

ID	Var1	Var2	Var3	Var4	Var5	Var6	Num_3ConsecutiveDuplications
1	1001	1001	1001	2002	2002	2002	2
2	2002	1001	1001	1001	NA	9999	1
3	1001	NA	1001	1001	9999	1234	0
4	3003	3003	3003	NA	2002	2002	1
5	NA	NA	NA	1001	1001	1001	1
6	3003	3003	3003	3003	3003	3003	1

Since in the actual data there are a significant range of possible values each variable can take on, and that there are over 40 variables, counting this by conditioning on each possible value and range of columns seems infeasible/inefficient. Any suggestions would be greatly appreciated.

>Solution :

You can use rle():

library(dplyr)

df %>%
  rowwise() %>%
  mutate(x = sum(rle(across(-ID))$lengths >= 3))

# A tibble: 6 x 8
# Rowwise: 
     ID  Var1  Var2  Var3  Var4  Var5  Var6     x
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1     1  1001  1001  1001  2002  2002  2002     2
2     2  2002  1001  1001  1001    NA  9999     1
3     3  1001    NA  1001  1001  9999  1234     0
4     4  3003  3003  3003    NA  2002  2002     1
5     5    NA    NA    NA  1001  1001  1001     1
6     6  3003  3003  3003  3003  3003  3003     1