Suppose we want to replace multiple substrings via pd.Series.replace or pd.DataFrame.replace by passing a dictionary to the to_replace argument
- What happens if multiple patterns (the dictionary keys) match in the
string? - Are applicable replacements performed at once or consecutively?
- If the latter, in which order are the replacements performed (e.g. the order the pattern matches occur in the string)?
- What happens if multiple patterns match substrings at the same position in the string (which can happen with regexes)?
- What happens if substrings in the replacement values match the patterns themselves?
Example:
Replace
- ‘nan’ –> ‘miss’
- ‘nan.*\b’ –> ‘nanword’
- ‘na’ –> ‘no’
- ‘miss’ –> ‘mrs’
- ‘bana’ –> ‘eric’
in the string ‘Nana likes bananas and ananas’.
>Solution :
Let’s try a short example:
s = pd.Series(['abcde', 'bcde', 'xyz'])
s.replace(to_replace={'ab': 'xy', 'bc': 'BC', 'cd': 'CD', 'xy': 'XY'}, regex=True)
0 xyCDe
1 BCde
2 XYz
dtype: object
- What happens if multiple patterns (the dictionary keys) match in the string? The keys are evaluated in order, in case of an overlap only the first match is replaced.
- Are applicable replacements performed at once or consecutively? From the perspective of the user, the replacements are performed simultaneously (i.e. there is no circular replacement). In the above example
xythat is replacingabis not further replaced byXY. - In which order are the replacements performed (e.g. the order the pattern matches occur in the string)? The order in the dictionary matters.
# let's swap the first two keys
s.replace(to_replace={'bc': 'BC', 'ab': 'xy', 'cd': 'CD', 'xy': 'XY'}, regex=True)
0 aBCde
1 BCde
2 XYz
dtype: object
- What happens if multiple patterns match substrings at the same position in the string (which can happen with regexes)? As shown above, the first match (in terms on position in the dictionary, not the string) is considered (
abvsbcinabc). - What happens if substrings in the replacement values match the patterns themselves? Nothing, there is no circular replacement.