Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add new Pandas Column based on whether list items in a column are inside a list

I have a "list_of_sendlists" with sample items like this:

list_of_sendlistsA = ["VIP","Officials","2021Customers","2020Customers"]

and I have a dataframe that contains some email addresses, and the "send lists" that are assigned to them: (Note: for d and e below, there are values in the lists that are not in "list_of_sendlists"

email   listsB
a@a.com VIP,Officials
b@b.com Officials
c@c.com 
d@d.com Non-factor
e@e.com Officials,Resigned

The tasks I want to manage is to add a column "on_list", if the email address has a value inside the "listB" that matches with "list_of_sendlistsA"

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

email   listsB              on_list
a@a.com VIP,Officials       Yes
b@b.com Officials           Yes
c@c.com                     No
d@d.com Non-factor          No
e@e.com Officials,Resigned  Yes

I tried df.listsB = df.listB.str.split(',') to turn the "listB" column into a Python List,

and then df['onlist'] = df.apply(lambda k: any(x in df['sendlist'] for x in k), axis = 1)

in order to return a "True" value if there are intersections between the two lists for each email address, but I couldn’t get what I want (all returned value of df['onlist'] were false)

May I know if someone could help me?

Thanks so much!

>Solution :

Use set.disjoint for False if disjoint values, so possible pass to numpy.where:

list_of_sendlistsA = ["VIP","Officials","2021Customers","2020Customers"]
mask = df.listsB.fillna('').str.split(',').map(set(list_of_sendlistsA).isdisjoint)
df['onlist'] = np.where(mask,'No','Yes' )

print (df)
     email              listsB onlist
0  a@a.com       VIP,Officials    Yes
1  b@b.com           Officials    Yes
2  c@c.com                 NaN     No
3  d@d.com          Non-factor     No
4  e@e.com  Officials,Resigned    Yes

Your solution should be change:

df['listsB'] = df.listsB.str.split(',')
mask = df['listsB'].fillna('').apply(lambda k: any(x in list_of_sendlistsA for x in k))
df['onlist'] = np.where(mask, 'Yes', 'No')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading