Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter rows matched string patterns (specific string + integer + another specific string) in Pandas

Given a DataFrame df:

   id                                        items
0   1  CRB Spot Index: Grease: First 2 Months: YoY
1   2                     CRB Spot Index: Textiles
2   3                    CRB Spot Index: Livestock
3   4      Korea: Export Value: First 10 Days: YoY
4   5      Korea: Export Value: First 20 Days: YoY
5   6      Korea: Import Value: First 20 Days: YoY
6   7      Korea: Import Value: First 10 Days: YoY
7   8    Vietnam: Import Value: First 2 Weeks: MoM
8   9    Vietnam: Import Value: First 3 Weeks: MoM

I want to filter items which match patterns: First + integer + Days or First + integer + Weeks, then create a new column to indicate them:

   id                                        items indicator
0   1  CRB Spot Index: Grease: First 2 Months: YoY       NaN
1   2                     CRB Spot Index: Textiles       NaN
2   3                    CRB Spot Index: Livestock       NaN
3   4      Korea: Export Value: First 10 Days: YoY         Y
4   5      Korea: Export Value: First 20 Days: YoY         Y
5   6      Korea: Import Value: First 20 Days: YoY         Y
6   7      Korea: Import Value: First 10 Days: YoY         Y
7   8    Vietnam: Import Value: First 2 Weeks: MoM         Y
8   9    Vietnam: Import Value: First 3 Weeks: MoM         Y

How could I do that using Pandas?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I use df.loc[(df['items'].str.contains('First|Days', na=False)) & (df['items'].str.contains('First|Weeks', na=False)), 'indicators']='Y', it doesn’t generate an expected result as shown above, also maybe there are other more concise answers.

>Solution :

Check regex for match First, then \s+ for space(s), then \d+ for integers, \s+ for again space(s) and last Days|Weeks for Days or Weeks:

df.loc[df['items'].str.contains('First\s+\d+\s+Days|Weeks', na=False), 'indicators']='Y'
print (df)
   id                                        items indicators
0   1  CRB Spot Index: Grease: First 2 Months: YoY        NaN
1   2                     CRB Spot Index: Textiles        NaN
2   3                    CRB Spot Index: Livestock        NaN
3   4      Korea: Export Value: First 10 Days: YoY          Y
4   5      Korea: Export Value: First 20 Days: YoY          Y
5   6      Korea: Import Value: First 20 Days: YoY          Y
6   7      Korea: Import Value: First 10 Days: YoY          Y
7   8    Vietnam: Import Value: First 2 Weeks: MoM          Y
8   9    Vietnam: Import Value: First 3 Weeks: MoM          Y
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading