I’m trying to write a function that takes in a vector and subsets it according to several steps:
- Throws away any unwanted values
- Removes duplicates.
- Returns the indexes of the original vector after accounting for steps (1) and (2).
For example, provided with the following input vector:
vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")
and
throw_away_val <- "cat"
I want my function get_indexes(x = vec_animals, y = throw_away_val) to return:
# [1] 1 6 # `1` is the index of the 1st unique ("dog") in `vec_animals`, `6` is the index of the 2nd unique ("dolphin")
Another example
vec_years <- c(2003, 2003, 2003, 2007, 2007, 2011, 2011, 2011)
throw_away_val <- 2003
Return:
# [1] 4 6 # `4` is the position of 1st unique (`2007`) after throwing away unwanted val; `6` is the position of 2nd unique (`2011`).
My initial attempt
The following function returns indexes but doesn’t account for duplicates
get_index <- function(x, throw_away) {
which(x != throw_away)
}
which then returns the indexes of the original vec_animals such as:
get_index(vec_animals, "cat")
#> [1] 1 2 3 4 6 7
If we use this output to subset vec_animal we get:
vec_animals[get_index(vec_animals, "cat")]
#> [1] "dog" "dog" "dog" "dog" "dolphin" "dolphin"
You could have suggested to operate on this output such as:
vec_animals[get_index(vec_animals, "cat")] |> unique()
#> [1] "dog" "dolphin"
But no, I need get_index() to return the correct indexes right away (in this case 1 and 6).
EDIT
A relevant procedure in which we can get the indexes of first occurrences of duplicates is provided with
library(bit64)
vec_num <- as.integer64(c(4, 2, 2, 3, 3, 3, 3, 100, 100))
unipos(vec_num)
#> [1] 1 2 4 8
Or more generally
which(!duplicated(vec_num))
#> [1] 1 2 4 8
Such solutions would have been great if had not needed to also throw away unwanted values.
>Solution :
Try:
get_index <- function(x, throw_away) {
which(!duplicated(x) & x!=throw_away)
}
> get_index(vec_animals, "cat")
[1] 1 6