I have list of string, and I want to filter out non English character from string.
list=['ಎಮ್ ರಾಮಲಿಂಗ ರೆಡ್ಡಿ, from the ಮನೆ ಸಂಖ್ಯೆ M Ramalinga Reddy,
'23/2 ವಾರ್ಡ್ ಸಂಖ್ಯೆ 18, 1ನೇ ಅಡ್ಡ ರಸ್ತೆ, 23/2 Ward No 18, Cross,']
My code:
regex = re.compile("[^a-zA-Z0-9!@#$&()\\-`.+,/\"]+")
for i in list:
li = regex.sub(' ', i)
print(li)
My output
[, from the M Ramalinga Reddy,
23/2 18, 1 , 23/2 Ward No 18, Cross,]
My desire output
[M Ramalinga Reddy,
23/2 Ward No 18, Cross]
>Solution :
Assuming we phrase your problem as finding the final English language words in each string in the list, we can try using re.findall here:
# -*- coding: utf-8 -*-
import re
inp = ['ಎಮ್ ರಾಮಲಿಂಗ ರೆಡ್ಡಿ, from the ಮನೆ ಸಂಖ್ಯೆ M Ramalinga Reddy, D No',
'23/2 ವಾರ್ಡ್ ಸಂಖ್ಯೆ 18, 1ನೇ ಅಡ್ಡ ರಸ್ತೆ, 23/2 Ward No 18, fst Cross,']
output = [re.findall(r'[a-zA-Z0-9"/]+(?: [a-zA-Z0-9!@#$&()\\-`.+,/\"]+)*$', x) for x in inp]
print(output)
# ['M Ramalinga Reddy, D No', '23/2 Ward No 18, fst Cross,']