I have a pandas dataframe, and I want to replace certain strings in one column.
The string could be something like this: "Spiderman is Nr 1" and I want to turn it to "Spiderman (Nr 1)"
The only part of the string that stays the same is "is Nr". The superhero and the number change, but not every superhero has a number to them. So the dataframe could look like this:
Superheros Spiderman is Nr 1 Batman is Nr 4 Joker Iron Man is Nr 2 Hulk Captain America Wonderwoman is Nr 3
And I want to change this Dataframe such that all is Nr \d are changed to (Nr \d):
Superheros Spiderman (Nr 1) Batman (Nr 4) Joker Iron Man (Nr 2) Hulk Captain America Wonderwoman (Nr 3)
I found that I can replace strings in one column like this:
df["Superheros"] = df["Superheros"].str.replace('is Nr', '(Nr')
But this obviously is missing the final bracket.
I would like to use regex, but I don’t know how to access the string in the columns. I think the pattern should be something like r’is Nr \d’, but I don’t know how to pass the number to the replacing string.
df["Superheros"] = df["Superheros"].str.replace(r'is Nr \d', r'(Nr \d)') df["Superheros"] = df["Superheros"].str.re.sub(r'is Nr \d', r'(Nr \d)')
but I get errors, because this is apparently not how to use regex on a column.
I hope it is clear what I am looking for. If you need any more info, let me know. I know there is a lot of regex things here on stackoverflow, but I didn’t find the combination of things I am looking for.
You can use
df["Superheros"] = df["Superheros"].str.replace(r'\bis\s+(Nr\s*\d+)', r'(\1)', regex=True)
See the regex demo
\b– a word boundary
is– a word
\s+– one or more whitespaces
(Nr\s*\d+)– Capturing group 1 (
\1in the replacement pattern refers to this group value):
Nr, zero or more whitespaces (
\s*), and one or more digits (
Note the use of
regex=True to avoid any warnings.