Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter a list of strings?

I have a list of strings that contain Non-English/English words. I want to filter out only English words.

Example:


phrases = [
    "S/O अशोक कुमार, ब्लॉक न.-4डी, S/O Ashok Kumar, Block no.-4D.",
    "स्ट्रीट-15, विभाग 5. सिविक सेंटर Street-15, sector -5, Civic Centre",
    "भिलाई, दुर्ग, भिलाई, छत्तीसगढ़, Bhilai, Durg. Bhilai, Chhattisgarh,",
]

My code so far:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import re
regex = re.compile("[^a-zA-Z0-9!@#$&()\\-`.+,/\"]+")
for i in phrases:
    print(regex.sub(' ', i))

My output:

["S/O , .-4 , S/O Ashok Kumar, Block no.-4D.",
  "-15, 5. Street-15, sector -5, Civic Centre",
  ", , , , Bhilai, Durg. Bhilai, Chhattisgarh",]

My desire output

["S/O Ashok Kumar, Block no.-4D.",
 "Street-15, sector -5, Civic Centre",
 "Bhilai, Durg. Bhilai, Chhattisgarh,"]

>Solution :

If I look at your data it seems you could use the following:

import regex as re
lst=["S/O अशोक कुमार, ब्लॉक न.-4डी, S/O Ashok Kumar, Block no.-4D.",
      "स्ट्रीट-15, विभाग 5. सिविक सेंटर Street-15, sector -5, Civic Centre",
      "भिलाई, दुर्ग, भिलाई, छत्तीसगढ़, Bhilai, Durg. Bhilai, Chhattisgarh,",]
for i in lst:
    print(re.sub(r'^.*\p{Devanagari}.+?\b', '', i))

Prints:

S/O Ashok Kumar, Block no.-4D.
Street-15, sector -5, Civic Centre
Bhilai, Durg. Bhilai, Chhattisgarh,

See an online regex demo

  • ^ – Start string anchor;
  • .*\p{Devanagari} – 0+ (Greedy) characters upto the last Devanagari letter;
  • .+?\b – 1+ (Lazy) characters upto the first word-boundary
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading