Home Number of different elements up to this point

Questions

Number of different elements up to this point

byMR

December 6, 2023

I’ve got a relatively simple problem (I think) and I want to solve it in a fast and efficient way.

I want to count the number of different elements in a vector up to each point in this vector.

For example, in a vector like this

vec <- c("a", "b", "c", "a", "a", "c", "d", "a")

I want to get the following vector of equal size as a result:
[1 2 3 3 3 3 4 4]

I could solve this of course with a for loop in combination with cumsum():

vec <- c("a", "b", "c", "a", "a", "c", "d", "a")
res <- T
for (i in 2:length(vec)) {
  res[i] <- !(vec[i] %in% vec[1:(i-1)])
}
cumsum(res)
[1] 1 2 3 3 3 3 4 4

However, I am dealing with vectors that have several million elements and a for-loop approach takes forever for such a relatively simple problem.

I have the intuition that this should be solvable much faster and more clever. Do you have any ideas? Thank you!

(In case you’re interested: I need this for a vocabulary growth curve analysis where we want to know at each point in the text how many different words, i.e. types, have been observed so far.)

>Solution :

Use cumsum on the non (!) duplicated values:

cumsum(!duplicated(vec))
#[1] 1 2 3 3 3 3 4 4

And another approach with match:

uni <- vector(length = length(vec))
uni[match(unique(vec), vec)] <- TRUE
cumsum(uni)

unique

byMR

Published December 06, 2023

Add a comment

PHP remove sub array from based on the specific index value

byMR

December 6, 2023

Questions

Flutter input error border does not wrap container

byMR

December 6, 2023

Questions

Css media doesn't work when i re-size the screen

byMR

December 6, 2023

Questions

How to identify groups based on two columns and ignore duplicates?

byMR

December 6, 2023

Questions

Finding Fibonacci series in python using def function

byMR

December 6, 2023

Number of different elements up to this point

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

PHP remove sub array from based on the specific index value

Flutter input error border does not wrap container

Css media doesn't work when i re-size the screen

How to identify groups based on two columns and ignore duplicates?

Finding Fibonacci series in python using def function

Keep Up to Date with the Most Important News

Number of different elements up to this point

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

PHP remove sub array from based on the specific index value

Flutter input error border does not wrap container

Css media doesn't work when i re-size the screen

How to identify groups based on two columns and ignore duplicates?

C# Selenium Unable to locate Button in Modal – not an Iframe

Finding Fibonacci series in python using def function

Discover more from Dev solutions