Comparison and indexing series of arrays with length > 1

Title sounds more complicated than the facts really are. Given the data

data = [
    np.array(['x'], dtype='object'),
    np.array(['y'], dtype='object'),
    np.array(['z'], dtype='object'),
    np.array(['x', 'z', 'y'], dtype='object'),
    np.array(['y', 'x'], dtype='object'),
]    

s = pd.Series(data)

I would like to retrieve to elements of s where s == np.array(['x']). The obvious way

c = np.array(['x'])
s[s==c]

does not work, since there is a ValueError in the comparison, complaining that "’Lengths must match to compare’, (5,), (1,)". I also tried

s[s=='x']

which only works if the elements of s have all exactly one element themselves.

Is there a way to retrieve all elements of s, where s == c, without converting the elements to string?

>Solution :

Use a list comprehension with numpy.array_equal:

c = np.array(['x'])

out = s[[np.array_equal(a, c) for a in s]]

Alternative with a partial function if you need to do this repeatedly (for the shorter syntax):

from functools import partial
eq_c = partial(np.array_equal, c)

out = s[map(eq_c, s)]

Output:

0    [x]
dtype: object

Leave a Reply