Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas str.extract() a number that ends in a letter

I have a pandas column like below:

      df['description']
0.   PRAIRIE HIGHLANDS SIXTH PLAT Lt: 156 PIN# DP73770000 0156 312 ABC
1.   PRAIRIE VILLAGE   PIN# OP55000034 0020A Rmrk: PT OF
2.   Sub: HOLLY GREEN Lt: 14 Bl: 1 PIN# DP34500001 0D14
3.   FAIRWAY   PIN# GP20000006 0029 Rmrk: W

I want to extract out PIN# such that expected results:

DP73770000 0156 312
OP55000034 0020A
DP34500001 0D14
GP20000006 0029

What I have tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['PIN'] = df['description'].str.extract(r'\b(?:PIN# ?) (\w+(?:(?:\s*[ ])+\s*\d+)*)\b')

Results I got:

DP73770000 0156 312
OP55000034
DP34500001
GP20000006 0029

The ones that have letters in between the numbers after space are not getting picked up. I have tried a couple other regex as well to no success. How can I resolve this? Please help.

>Solution :

I would use str.extract as follows:

df["PIN"] = df["description"].str.extract(r'PIN#((?: [A-Z0-9]*[0-9][A-Z0-9]*)*)')

Here is a link to a running regex demo showing that the logic is working.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading