Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

String replace in python confuses . with E?

I am trying to remove all "M. " that appears in the beginning of a column. This is supposed to be easy. Here is my code:

df['name'] = df['name'].str.replace('M. ', "", regex=True)

Here is a sample of my data:

name
M. ABAD John
M. BOULMÉ Jean
Mme BONO-VANDORME Anne

This is what I am obtaining:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

name
ABAD Jogn
BOULJean
Mme BONO-VANDORAnne

I find this result very weird. It seems that python is confusing "E" with ".". Why is this happening? How should I correct the code?

>Solution :

Pandas str.replace() method is different from the Python Built-in str.replace() in that str.replace considers its first argument as a regular expression.

In regular expression the dot . represents any single character, therefore the string ME matches the regular expression M.

Therefore the solution in your case would be to disable treating the first argument as a regular expression.

With regex=False str.replace would perform a normal character string substitution.

df['name'].str.replace('M. ', '', regex=False)

Note that in the latest versions of Pandas (since pandas 2.0) regex=False is the default, so you could just avoid this optional argument altogether. Yet beware that in earlier versions the default was exactly the opposite.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading