Pandas str.replace with regex doubles results?

October 29, 2023

Let’s say I have this pandas Series:

$ python3 -c 'import pandas as pd; print(pd.Series(["1","2","3","4"]))'
0    1
1    2
2    3
3    4
dtype: object

I’d like to "wrap" the strings "1","2","3","4" so they are prefixed with "a" and suffixed with "b" -> that is, I want to get "a1b","a2b","a3b","a4b". So I try https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html

$ python3 -c 'import pandas as pd; print(pd.Series(["1","2","3","4"]).str.replace("(.*)", r"a\1b", regex=True))'
0    a1bab
1    a2bab
2    a3bab
3    a4bab
dtype: object

So – I did get a "wrap" of "1" into "a1b" -> but then "ab" is repeated one more time?

(Trying this regex in regex101.com, I’ve noticed I get the same "ghost copies" of "ab" at end if the g flag is enabled; so maybe Pandas .str.replace somehow enables it? But then, default is flags=0 for Pandas .str.replace as per docs ?!)

How can I get the entire contents of a column cell "wrapped" in only those characters that I want?

>Solution :

Change (.*) to (.+):

andrej@Andrej-PC:~/app$ python3 -c 'import pandas as pd; print(pd.Series(["1","2","3","4"]).str.replace("(.+)", r"a\1b", regex=True))'
0    a1b
1    a2b
2    a3b
3    a4b
dtype: object