Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Looking for efficient string replacement algorythm

I’m trying to create a string replacer that accepts multilpe replacements.

The ideia is that it would scan the string to find substrings and replace those substrings with another substring.

For example, I should be able to ask it to replace every "foo" for "bar". Doing that is trivial.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The issue starts when I’m trying to add multiple replacements for this function. Because if I ask it to replace "foo" for "bar" and "bar" for "biz", running those replacements in sequence would result in "foo" turning to "biz", and this behavior is unintended.

I tried splitting the string into words and running each replacement function in each word. However that’s not bullet proof either because still results in unintended behavior, since you can ask it to replace substrings that are not whole words. Also, I find that very inefficient.

I’m thinking in some way of running each replacer once in the whole string and sort of storing those changes and merging them. However I think I’m overengineering.

Searching on the web gives me trivial results on how to use string.replace with regular expressions, it doesn’t solve my problem.

Is this a problem already solved? Is there an algorithm that can be used here for this string manipulation efficiently?

>Solution :

If you modify your string while searching for all occurences of substrings to be replaced, you’ll end up modifying incorrect states of the string. An easy way out could be to get a list of all indexes to update first, then iterate over the indexes and make replacements. That way, indexes for "bar" would’ve been already computed, and won’t be affected even if you replace any substring with "bar" later.

Adding a rough Python implementation to give you an idea:

import re
string = "foo bar biz"
replacements = [("foo", "bar"), ("bar", "biz")]
replacement_indexes = []
for item in replacements:
    replacement_indexes += [m.start() for m in re.finditer(item[0], string)]
for i in range(len(replacement_indexes)):
    old, new, indexes = replacements[i][0], replacements[i][1], replacement_indexes[i]
    for index in indexes:
        # replace old string occuring at index with new string

Edit:
A small caveat, this works well if the strings to replaced have the same length as the replacements strings. If that’s not the case, you’ll need to maintain a offset (basically sum up the difference between the replaced string and the replacement string) and add that to each index during the replacement procedure.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading