Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Apply string in list according to beginning of the strings in a pandas dataframe column

Let’s take an example.

I have a list of categories that are identified :

L_known_categories = ["Orange","Green","Red","Black & White"]

The strings in that list can’t be a substring of another string in that list.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

And a dataframe :

df = pd.DataFrame({"Items":["green apple","blue bottle","RED APPLE","Green paper","Black & White glasses",
                            "An orange fruit"]})

                   Items
0            green apple
1            blue bottle
2              RED APPLE
3            Green paper
4  Black & White glasses
5        An orange fruit

I would like to add a columns Category to this dataframe. If the string in the column Items starts as a string in L_known_categories, no matter the case of the characters, the category is that string. If no string founded, the category is the string in columns Items.

I could use a for loop but it is not efficient with my real big dataframe. How please could I do ?

Expected output :

                   Items         Category
0            green apple            Green
1            blue bottle      blue bottle
2              RED APPLE              Red
3            Green paper            Green
4  Black & White glasses    Black & White
5        An orange fruit  An orange fruit

>Solution :

You can use regex in pandas.Series.str.extract:

>>> df['Category'] = df['Items'].str.title().str.extract(
        '(^' 
        + '|'.join(item for item in L_known_categories) 
        + ')'
    )[0].fillna(df['Items'])

>>> df
    Items                   Category
0   green apple             Green
1   blue bottle             blue bottle
2   RED APPLE               Red
3   Green paper             Green
4   Black & White glasses   Black & White
5   An orange fruit         An orange fruit
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading