I am trying to remove all "M. " that appears in the beginning of a column. This is supposed to be easy. Here is my code:
df['name'] = df['name'].str.replace('M. ', "", regex=True)
Here is a sample of my data:
name M. ABAD John M. BOULMÉ Jean Mme BONO-VANDORME Anne
This is what I am obtaining:
name ABAD Jogn BOULJean Mme BONO-VANDORAnne
I find this result very weird. It seems that python is confusing "E" with ".". Why is this happening? How should I correct the code?
str.replace() method is different from the Python Built-in
str.replace() in that str.replace considers its first argument as a regular expression.
In regular expression the dot
. represents any single character, therefore the string
ME matches the regular expression
Therefore the solution in your case would be to disable treating the first argument as a regular expression.
str.replace would perform a normal character string substitution.
df['name'].str.replace('M. ', '', regex=False)
Note that in the latest versions of Pandas (since pandas 2.0)
regex=False is the default, so you could just avoid this optional argument altogether. Yet beware that in earlier versions the default was exactly the opposite.