Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regular Expression for Range chinese chars and selected groups of chars

I’m trying to get all Chinese sentences from strings with addtional group of chars like [NAME] and [PLACE].

I have this string

<DisplayName>凡人战争</DisplayName>
<Desc>[NAME]赶到[PLACE],发现战火正燃,此地百姓饱受战争之苦。</Desc>
<Display>劝停战争</Display>  
<OKResult><![CDATA[me:AddMsg(XT("[NAME]以仙法摄走两军首领,一番劝戒,迫使他们停止了战争 ...

and I want find

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

凡人战争
[NAME]赶到[PLACE],发现战火正燃,此地百姓饱受战争之苦
[NAME]以仙法摄走两军首领,一番劝戒,迫使他们停止了战争,消弭了这场祸事
此举手段温和,虽无人知晓,但却顺应天道,[NAME]获得了一些功德

I know for chinese chars regex is [\u4e00-\u9fff\uFF0C]+
and for group chars (\u005BNAME\u005D) and (\u005BPLACE\u005D) but how to combine this.

I try this way written in python

Array_of_words = re.findall(r'[\u4e00-\u9fff\uFF0C(\u005BNAME\u005D)(\u005BPLACE\u005D)]+', text)

But additionally marks single letters and brackets like this:

['N', 'N', '凡人战争', 'N', '[NAME]赶到[PLACE],发现战火正燃,此地百姓饱受战争之苦', '劝停战争', '[C', 'A', 'A[', 'A', 'M', '(', '(', '[NAME]以仙法摄走两军首领,一番劝戒,迫使他们停止了战争,消弭了这场祸事', '此举手段温和,虽无人知晓,但却顺应天道,[NAME]获得了一些功德', '))', 'A', 'P', '(', '(', '))', '()', ']]']

>Solution :

You can use

re.findall(r'(?:\[(?:PLACE|NAME)]|[\u4e00-\u9fff\uFF0C])+', text)

Details

  • (?: – start of a non-capturing group:
    • \[(?:PLACE|NAME)][, then either PLACE or NAME and then ]
    • | – or
    • [\u4e00-\u9fff\uFF0C] – a Chinese char pattern of yours
  • )+ – end of the group, match one or more occurrences.

To match any uppercase ASCII letters inside square brackets, replace \[(?:PLACE|NAME)] with \[[A-Z]+].

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading