Is there a vectorization method to match a df column to a dictionary

I have the following (abbreviated) dictionary:

_d = {

and the following dataframe:

df = 
1   "C0015672"
2   "C0015634"
3   "C0011579"
4   "C0030193"
5   "C0031193"
6   "C0030193"

I want to match up the column df["cui"], such that if the value of df["cui"] is in any of the value lists in then dictionary then I want a new column "symptom" based on the dictionary key, otherwise it would remain null.

This is the desired output:

df = 
        cui           symptom
    1   C0015672      fatigue 
    2   C0015634      NaN
    3   C0011579      depression 
    4   C0030193      pain
    5   C0031193      NaN
    6   C0030193      pain

I can do this by iterating over each row in the dataframe, but since there are 10s of millions of rows, it’s super slow. I’m looking for a way to vectorize this.

>Solution :

You can iterate over the dictionary first, then map the dictionary to cui column

d = {v: k
    for k, lst in _d.items()
    for v in lst}

df['symptom'] = df['cui'].map(d)

Leave a Reply