Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

More than 6 characters string repeated

I am trying to find the repeated strings (not words) from text.

x = 'This is a sample text and this is lowercase text that is repeated.'

In this example, the string ‘ text ‘ should not return because only 6 characters match with one another. But the string ‘his is ‘ is the expected value returned.

I tried using range, Counter and regular expression.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import re
from collections import Counter

duplist = list()
for i in range(1, 30):
  mylist = re.findall('.{1,'+str(i)+'}', x)
  duplist.append([k for k,v in Counter(mylist).items() if v>1])

>Solution :

You can use a quantifier of {7,} to ensure that a match is more than 6 characters long, and use a positive lookahead pattern with a backreference to assert that the captured string is repeated:

import re

x = 'This is a sample text and this is lowercase text that is repeated.'
print(re.findall(r'(.{7,})(?=.*\1)', x, re.S))

This outputs:

['his is ', 'e text ']

Demo: https://ideone.com/jZvQR5

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading