Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

cleaning a sentence from numbers, signs and other languages

I have a txt file that contain Japanese sentences. I would like to remove all non Japanese words. Such as numbers, English alphabets or any other non Japanese language, signs, symbols. Is there a quick way to do it? Thanks

Hi !こんにちは、私の給料は月額10000ドルです。 XO XO
私はあなたの料理が大好きです
私のフライトはAPX1999です。
私はサッカーの試合を見るのが大好きです。

Words to remove :
Hi !
XO XO
10000
APX1999

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The simplest way is this:

s = "Hi !こんにちは、私の給料は月額10000ドルです。 XO XO 私はあなたの料理が大好きです私のフライトはAPX1999です。私はサッカーの試合を見るのが大好きです"

no_ascii = ''
for c in s:
    ascii_code = ord(c)
    if ascii_code > 127 or ascii_code == 0:
        no_ascii += c

print(no_ascii)
こんにちは、私の給料は月額ドルです。私はあなたの料理が大好きです私のフライトはです。私はサッカーの試合を見るのが大好きです
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading