Home How to identify the elements that do not match a given sequence in an array?

Questions

How to identify the elements that do not match a given sequence in an array?

June 24, 2023

Consider the given sequence sequence = [1, 1, 0].

An example of an array that follows the sequence throughout is [1, 1, 0, 1, 1, 0, 1, 1, 0].

I have an array that follows the sequence throughout except with a few outliers. An example of such an array is [1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0]. In this array, the elements that are outliers are at the 6th and 16th indices. If these outliers are removed, the array would follow the sequence throughout.

I want to write a python code that prints the indices of the outliers in the array. The array is assumed to have only 0s and 1s.

The code should look like below:

sequence = [1, 1, 0]
array_with_outliers = [...] //given
//code to identify outliers
print(indices_of_outliers)

My Failed Attempts

I tried to break the array_with_outliers into chunks of 3 elements. I then iterated through the chunks to see if they matched the sequence. If chunk n didn’t match the sequence, I popped the elements upto index 3(n-1) in the array_with_outliers and chunked the array_with_outliers again, carrying out the same process recursively till the array_with_outliers was empty. The code was too inefficient, however, and my laptop wasn’t able to run it.

I also tried to iterate the array_with_outliers and see if each element matched its expected value based on the sequence. I computed the expected value based on the index and the number of outliers identified so far. However, the problem was that I was individually comparing each element to its expected value, and not the chunk as a whole, which created overlapping problems.

>Solution :

You can do this in linear time by just iterating through the array, skipping matching chunks, and accumulating non-matching indices one at a time until you find the next matching chunk.

def find_outliers(needle: list[int], haystack: list[int]) -> list[int]:
    """Find indices in haystack where
    the sequence of repeated needles is broken."""
    outliers: list[int] = []
    i = 0
    while i < len(haystack):
        if haystack[i:i+len(needle)] == needle:
            i += len(needle)
        else:
            outliers.append(i)
            i += 1
    return outliers

assert not find_outliers(
    [1, 1, 0],
    [1, 1, 0, 1, 1, 0, 1, 1, 0]
)

assert find_outliers(
    [1, 1, 0],
    [1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0]
) == [6, 16]