Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What is a regex expression that can prune down repeating identical characters down to a maximum of two repeats?

I feel I am having the most difficulty explaining this well enough for a search engine to pick up on what I’m looking for. The behavior is essentially this:

string = "aaaaaaaaare yooooooooou okkkkkk"

would become "aare yoou okk", with the maximum number of repeats for any given character is two.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Matching the excess duplicates, and then re.sub -ing it seems to me the approach to take, but I can’t figure out the regex statement I need.

The only attempt I feel is even worth posting is this – (\w)\1{3,0}

Which matched only the first instance of a character repeating more than three times – so only one match, and the whole block of repeated characters, not just the ones exceeding the max of 2.

Any help is appreciated!

>Solution :

The regexp should be (\w)\1{2,} to match a character followed by at least 2 repetitions. That’s 3 or more when you include the initial character.

The replacement is then \1\1 to replace with just two repetitions.

string = "aaaaaaaaare yooooooooou okkkkkk"
new_string = re.sub(r'(\w)\1{2,}', r'\1\1', string)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading