Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas str.replace with regex doubles results?

Let’s say I have this pandas Series:

$ python3 -c 'import pandas as pd; print(pd.Series(["1","2","3","4"]))'
0    1
1    2
2    3
3    4
dtype: object

I’d like to "wrap" the strings "1","2","3","4" so they are prefixed with "a" and suffixed with "b" -> that is, I want to get "a1b","a2b","a3b","a4b". So I try https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html

$ python3 -c 'import pandas as pd; print(pd.Series(["1","2","3","4"]).str.replace("(.*)", r"a\1b", regex=True))'
0    a1bab
1    a2bab
2    a3bab
3    a4bab
dtype: object

So – I did get a "wrap" of "1" into "a1b" -> but then "ab" is repeated one more time?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(Trying this regex in regex101.com, I’ve noticed I get the same "ghost copies" of "ab" at end if the g flag is enabled; so maybe Pandas .str.replace somehow enables it? But then, default is flags=0 for Pandas .str.replace as per docs ?!)

How can I get the entire contents of a column cell "wrapped" in only those characters that I want?

>Solution :

Change (.*) to (.+):

andrej@Andrej-PC:~/app$ python3 -c 'import pandas as pd; print(pd.Series(["1","2","3","4"]).str.replace("(.+)", r"a\1b", regex=True))'
0    a1b
1    a2b
2    a3b
3    a4b
dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading