How to remove english letter between non english letter?

January 13, 2022

I have list of string, and I want to filter out non English character from string.

list=['ಎಮ್ ರಾಮಲಿಂಗ ರೆಡ್ಡಿ, from the ಮನೆ ಸಂಖ್ಯೆ  M Ramalinga Reddy,
      '23/2 ವಾರ್ಡ್ ಸಂಖ್ಯೆ 18, 1ನೇ ಅಡ್ಡ ರಸ್ತೆ, 23/2 Ward No 18, Cross,']

My code:

regex = re.compile("[^a-zA-Z0-9!@#$&()\\-`.+,/\"]+")
for i in list:
   li = regex.sub(' ', i)
   print(li)

My output

[, from the M Ramalinga Reddy,
23/2 18, 1 , 23/2 Ward No 18, Cross,]

My desire output

[M Ramalinga Reddy,
23/2 Ward No 18, Cross]

>Solution :

Assuming we phrase your problem as finding the final English language words in each string in the list, we can try using re.findall here:

# -*- coding: utf-8 -*-

import re
inp = ['ಎಮ್ ರಾಮಲಿಂಗ ರೆಡ್ಡಿ, from the ಮನೆ ಸಂಖ್ಯೆ  M Ramalinga Reddy, D No',
  '23/2 ವಾರ್ಡ್ ಸಂಖ್ಯೆ 18, 1ನೇ ಅಡ್ಡ ರಸ್ತೆ, 23/2 Ward No 18, fst Cross,']
output = [re.findall(r'[a-zA-Z0-9"/]+(?: [a-zA-Z0-9!@#$&()\\-`.+,/\"]+)*$', x) for x in inp]
print(output)
# ['M Ramalinga Reddy, D No', '23/2 Ward No 18, fst Cross,']