Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

filtering pandas dataframe when data contains two parts

I have a pandas dataframe and want to filter down to all the rows that contain a certain criteria in the “Title” column.
The rows I want to filter down to are all rows that contain the format “(Axx)” (Where xx are 2 numbers).
The data in the “Title” column doesn’t just consist of “(Axx)” data.
The data in the “Title” column looks like so:

“some_string (Axx)”

What Ive been playing around a bit with different methods but cant seem to get it.
I think the closest ive gotten is:

df.filter(regex=r'(D\d{2})', axis=0))

but its not correct as the entries aren’t being filtered.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use Series.str.contains with escape () and $ for end of string and filter in boolean indexing:

df = pd.DataFrame({'Title':['(D89)','aaa (D71)','(D5)','(D78) aa','D72']})
print (df)
       Title
0      (D89)
1  aaa (D71)
2       (D5)
3   (D78) aa
    
df1 = df[df['Title'].str.contains(r'\(D\d{2}\)$')]
print (df1)
4        D72
       Title
0      (D89)
1  aaa (D71)

If ned match only (Dxx) use Series.str.match:

df2 = df[df['Title'].str.match(r'\(D\d{2}\)$')]
print (df2)
   Title
0  (D89)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading