I have a rather short question.
These are some exemplaric vectors to reproduce:
a <- c(14,26,38,64,96,127,152,152,152,152,152,152)
b <- c(4,7,9,13,13,13,13,13,13,13,13,13,13,13)
c <- c(62,297,297,297,297,297,297,297,297,297,297,297)
It is obvious that at some point a certain value is repeated until the end. I need to get exactly the index where this values appears for the first time.
So in this case the output would be 7,4,2, since in a 152 starts at the 7th position, in b 13 starts at the 4th position and in c 297 starts at the 2nd position.
I hope this is clear.
Anybody with a hint how to get this automatically?
>Solution :
You could use rle() to take the run-length encoding of every value except the final one and sum their lengths:
get_index <- \(x) sum(head(rle(x)$lengths, -1)) + 1
sapply(list(a, b, c), get_index)
# [1] 7 4 2
Rcpp solution
If your vectors are really long and the last value is only repeated towards the end, you don’t need to check the length of every run, so the above will be inefficient. It’s better to start from the end of the vector and work backwards until you find a different value:
Rcpp::cppFunction('
int get_index2(NumericVector x) {
int n = x.size();
double last_value = x[n - 1];
for (int i = n - 2; i >= 0; --i) {
if (x[i] != last_value) {
return i + 2; // +1 as it is next element; +1 for 1-indexing
}
}
return 1; // all elements are the same
}
')
sapply(list(a,b,c), get_index2)
# [1] 7 4 2