Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to check the string format of an entire column in Python using regex

I have Account Names which look like GH85036, LG95639, etc in a column. I want to check the format of the entire columns so I can edit the ones that don’t follow the format. This is my first time using regex.

So far I have got

for i in Reports['Account Name']:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 match = re.findall(r'\[A-Z]{2}[0-9][0-9][0-9][0-9][0-9]', Reports['Account Name']) is None

The error message I get:

<ipython-input-77-86f17b9d34ff> in <module>()
      1 for i in Reports['Account Name']:
----> 2     match = re.findall(r'\[A-Z]{2}[0-9][0-9][0-9][0-9][0-9]', Reports['Account Name']) is None

C:\Program Files\Anaconda3\lib\re.py in findall(pattern, string, flags)
    221 
    222     Empty matches are included in the result."""
--> 223     return _compile(pattern, flags).findall(string)
    224 
    225 def finditer(pattern, string, flags=0):

TypeError: expected string or bytes-like object

>Solution :

Assuming the correct/acceptable account number be two capital letters followed by 5 digits, we can use str.contains on the entire column to flag any non matching values:

Reports[~Reports["Account Name"].str.contains(r'^[A-Z]{2}[0-9]{5}$', regex=True)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading