Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas add brackets around part of string containing numbers

I have a pandas dataframe, and I want to replace certain strings in one column.
The string could be something like this: "Spiderman is Nr 1" and I want to turn it to "Spiderman (Nr 1)"
The only part of the string that stays the same is "is Nr". The superhero and the number change, but not every superhero has a number to them. So the dataframe could look like this:

Superheros
Spiderman is Nr 1
Batman is Nr 4
Joker
Iron Man is Nr 2
Hulk
Captain America
Wonderwoman is Nr 3

And I want to change this Dataframe such that all is Nr \d are changed to (Nr \d):

Superheros
Spiderman (Nr 1)
Batman (Nr 4)
Joker
Iron Man (Nr 2)
Hulk
Captain America
Wonderwoman (Nr 3)

I found that I can replace strings in one column like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df["Superheros"] = df["Superheros"].str.replace('is Nr', '(Nr')

But this obviously is missing the final bracket.

I would like to use regex, but I don’t know how to access the string in the columns. I think the pattern should be something like r’is Nr \d’, but I don’t know how to pass the number to the replacing string.

I tried

df["Superheros"] = df["Superheros"].str.replace(r'is Nr \d', r'(Nr \d)')
df["Superheros"] = df["Superheros"].str.re.sub(r'is Nr \d', r'(Nr \d)')

but I get errors, because this is apparently not how to use regex on a column.

I hope it is clear what I am looking for. If you need any more info, let me know. I know there is a lot of regex things here on stackoverflow, but I didn’t find the combination of things I am looking for.

>Solution :

You can use

df["Superheros"] = df["Superheros"].str.replace(r'\bis\s+(Nr\s*\d+)', r'(\1)', regex=True)

See the regex demo

Details

  • \b – a word boundary
  • is – a word is
  • \s+ – one or more whitespaces
  • (Nr\s*\d+) – Capturing group 1 (\1 in the replacement pattern refers to this group value): Nr, zero or more whitespaces (\s*), and one or more digits (\d+).

Note the use of regex=True to avoid any warnings.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading