Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Writing a function that takes a vector as input, throws away unwanted values, de-duplicates, and returns respective indexes of original vector

I’m trying to write a function that takes in a vector and subsets it according to several steps:

  1. Throws away any unwanted values
  2. Removes duplicates.
  3. Returns the indexes of the original vector after accounting for steps (1) and (2).

For example, provided with the following input vector:

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")

and

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

throw_away_val <- "cat"

I want my function get_indexes(x = vec_animals, y = throw_away_val) to return:

# [1] 1 6   # `1` is the index of the 1st unique ("dog") in `vec_animals`, `6` is the index of the 2nd unique ("dolphin")

Another example

vec_years <- c(2003, 2003, 2003, 2007, 2007, 2011, 2011, 2011)
throw_away_val <- 2003

Return:

# [1] 4 6 # `4` is the position of 1st unique (`2007`) after throwing away unwanted val; `6` is the position of 2nd unique (`2011`).

My initial attempt

The following function returns indexes but doesn’t account for duplicates

get_index <- function(x, throw_away) {
  which(x != throw_away)
}

which then returns the indexes of the original vec_animals such as:

get_index(vec_animals, "cat")
#> [1] 1 2 3 4 6 7

If we use this output to subset vec_animal we get:

vec_animals[get_index(vec_animals, "cat")]
#> [1] "dog"     "dog"     "dog"     "dog"     "dolphin" "dolphin"

You could have suggested to operate on this output such as:

vec_animals[get_index(vec_animals, "cat")] |> unique()
#> [1] "dog"     "dolphin"

But no, I need get_index() to return the correct indexes right away (in this case 1 and 6).


EDIT


A relevant procedure in which we can get the indexes of first occurrences of duplicates is provided with

library(bit64)

vec_num <- as.integer64(c(4, 2, 2, 3, 3, 3, 3, 100, 100))
unipos(vec_num)
#> [1] 1 2 4 8

Or more generally

which(!duplicated(vec_num))
#> [1] 1 2 4 8

Such solutions would have been great if had not needed to also throw away unwanted values.

>Solution :

Try:

get_index <- function(x, throw_away) {
  which(!duplicated(x) & x!=throw_away)
  }

> get_index(vec_animals, "cat")
[1] 1 6
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading