How to count the occurrences of a column's value in a column of lists?

Advertisements

Consider the following dataframe:

    column_of_lists   scalar_col
0   [100, 200, 300]       100
1   [100, 200, 200]       200
2   [300, 500]            300
3   [100, 100]            200

The desired output would be a Series, representing how many times the scalar value of scalar_col appears inside the list column.

So, in our case:

1 # 100 appears once in its respective list
2 # 200 appears twice in its respective list
1 # ...
0

I have tried something along the lines of:

df['column_of_lists'].apply(lambda x: x.count(df['scalar_col'])

and I get it that it won’t work because I am asking it to count a Series instead of a single value.

Any help would be welcome!

>Solution :

Use list comprehension:

df['new'] = [x.count(y) for x,y in zip(df['column_of_lists'], df['scalar_col'])]
print (df)
   column_of_lists  scalar_col  new
0  [100, 200, 300]         100    1
1  [100, 200, 200]         200    2
2       [300, 500]         300    1
3       [100, 100]         200    0

If performance not important use DataFrame.apply with axis=1:

df["new"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)

#40k rows
df = pd.concat([df] * 10000, ignore_index=True)

In [145]: %timeit df["new1"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)
572 ms ± 99.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [146]: %timeit df['new2'] = [x.count(y) for x,y in zip(df['column_of_lists'], df['scalar_col'])]
22.7 ms ± 840 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Leave a ReplyCancel reply