String replace in python confuses . with E?

October 1, 2023

I am trying to remove all "M. " that appears in the beginning of a column. This is supposed to be easy. Here is my code:

df['name'] = df['name'].str.replace('M. ', "", regex=True)

Here is a sample of my data:

name
M. ABAD John
M. BOULMÉ Jean
Mme BONO-VANDORME Anne

This is what I am obtaining:

name
ABAD Jogn
BOULJean
Mme BONO-VANDORAnne

I find this result very weird. It seems that python is confusing "E" with ".". Why is this happening? How should I correct the code?

>Solution :

Pandas str.replace() method is different from the Python Built-in str.replace() in that str.replace considers its first argument as a regular expression.

In regular expression the dot . represents any single character, therefore the string ME matches the regular expression M.

Therefore the solution in your case would be to disable treating the first argument as a regular expression.

With regex=False str.replace would perform a normal character string substitution.

df['name'].str.replace('M. ', '', regex=False)

Note that in the latest versions of Pandas (since pandas 2.0) regex=False is the default, so you could just avoid this optional argument altogether. Yet beware that in earlier versions the default was exactly the opposite.