Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas replace multiple substring patterns via dictionary

Suppose we want to replace multiple substrings via pd.Series.replace or pd.DataFrame.replace by passing a dictionary to the to_replace argument

  • What happens if multiple patterns (the dictionary keys) match in the
    string?
  • Are applicable replacements performed at once or consecutively?
  • If the latter, in which order are the replacements performed (e.g. the order the pattern matches occur in the string)?
  • What happens if multiple patterns match substrings at the same position in the string (which can happen with regexes)?
  • What happens if substrings in the replacement values match the patterns themselves?

Example:

Replace

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • ‘nan’ –> ‘miss’
  • ‘nan.*\b’ –> ‘nanword’
  • ‘na’ –> ‘no’
  • ‘miss’ –> ‘mrs’
  • ‘bana’ –> ‘eric’

in the string ‘Nana likes bananas and ananas’.

>Solution :

Let’s try a short example:

s = pd.Series(['abcde', 'bcde', 'xyz'])

s.replace(to_replace={'ab': 'xy', 'bc': 'BC', 'cd': 'CD', 'xy': 'XY'}, regex=True)

0    xyCDe
1     BCde
2      XYz
dtype: object
  • What happens if multiple patterns (the dictionary keys) match in the string? The keys are evaluated in order, in case of an overlap only the first match is replaced.
  • Are applicable replacements performed at once or consecutively? From the perspective of the user, the replacements are performed simultaneously (i.e. there is no circular replacement). In the above example xy that is replacing ab is not further replaced by XY.
  • In which order are the replacements performed (e.g. the order the pattern matches occur in the string)? The order in the dictionary matters.
# let's swap the first two keys
s.replace(to_replace={'bc': 'BC', 'ab': 'xy', 'cd': 'CD', 'xy': 'XY'}, regex=True)

0    aBCde
1     BCde
2      XYz
dtype: object
  • What happens if multiple patterns match substrings at the same position in the string (which can happen with regexes)? As shown above, the first match (in terms on position in the dictionary, not the string) is considered (ab vs bc in abc).
  • What happens if substrings in the replacement values match the patterns themselves? Nothing, there is no circular replacement.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading