Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas apply condition on a column that contains list

I want to create a new column based on a condition that have to be applied on a list. Here’s a reproducible example:

import pandas as pd

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "BRAND": [[], ["LVH"], ["FER", "MER", "POR"], ["WDC", "AUD", "LVH"], ["AST"]]
    }
)
print(df)
   ID            BRAND
0   1               []
1   2            [LVH]
2   3  [FER, MER, POR]
3   4  [WDC, AUD, LVH]
4   5            [AST]

As one can see, each object in the BRAND column is a list that can contain one or more elements (the list can also be empty as for the row where ID = 1).

Now, given the following target list target_list = ["LVH", "WDC"], my goal is to create a new column based on the following rule: if at least one element of target_list (i.e. either LVH or WDC) is present in the BRAND column value, then assign a flag equal to Y in a new column (otherwise assign N). The resulting DataFrame for the above example should look as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   ID            BRAND FLAG
0   1               []    N
1   2            [LVH]    Y
2   3  [FER, MER, POR]    N
3   4  [WDC, AUD, LVH]    Y
4   5            [AST]    N

>Solution :

Option 1

Seems to be a bit faster on a larger set than Option 2 below:

df['FLAG'] = df.BRAND.explode().isin(target_list).groupby(level=0, sort=False)\
    .any().map({True:'Y',False:'N'})

print(df)

   ID            BRAND FLAG
0   1               []    N
1   2            [LVH]    Y
2   3  [FER, MER, POR]    N
3   4  [WDC, AUD, LVH]    Y
4   5            [AST]    N

Explanation:

  • Use Series.explode to "[t]ransform each element of a list-like to a row".
  • Check for matches with Series.isin, and get True or False.
  • We now have a series with duplicate rows, so use Series.groupby to isolate the groups, apply any, and get a pd.Series back with booleans in the correct shape.
  • Finally, use Series.map to turn False and True into "N" and "Y" respectively.

Option 2:

Basically same performance as the answer by @AnoushiravanR

df['FLAG'] = df.BRAND.apply(lambda x: 'Y' if len(set(x) & set(target_list)) 
                            else 'N')

print(df)

   ID            BRAND FLAG
0   1               []    N
1   2            [LVH]    Y
2   3  [FER, MER, POR]    N
3   4  [WDC, AUD, LVH]    Y
4   5            [AST]    N

Explanation: set(list_a) & set(list_b) being a shorthand for set_a.intersection(set_b), which we pass to len(). If len(...) == 0, this will result in False.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading