I need to find Japanese characters in a file that are enclosed with quotation marks. In some cases, the resulting phrase also has half-width numbers, but not always. I would like to write a regular expression that matches phrases that have Japanese characters, either with or without half-width numbers, but not phrases that only have half-width numbers.
Test contents:
"文章表示"
"1文章表示"
"文章表示1"
"文章1表示"
"1"
I would like to match the first four examples, but not the fifth one.
My current regular expressions:
/("|')[一-龠ぁ-ゔァ-ヴーa-zA-Z0-9々〆〤、。]+("|')/
//Matches the first, but not the second - fourth
/("|')[一-龠ぁ-ゔァ-ヴーa-zA-Z0-9々〆〤、。0-9]+("|')/
//Matches all
/("|')[一-龠ぁ-ゔァ-ヴーa-zA-Z0-9々〆〤、。]+[0-9]*("|')/
/("|')[0-9]*[一-龠ぁ-ゔァ-ヴーa-zA-Z0-9々〆〤、。]+("|')/
//Matches the first and either the second or the third
/("|')[0-9]*[一-龠ぁ-ゔァ-ヴーa-zA-Z0-9々〆〤、。]+[0-9]*[一-龠ぁ-ゔァ-ヴーa-zA-Z0-9々〆〤、。]*[0-9]*("|')/
//Matches the first through fourth, but is very inefficent
I am looking for an optimal way to write the logic in the last regular expression.
>Solution :
You can modify your second regex with a negative lookahead for a string which only has half-width numbers before the closing quote:
(["'])(?!\d+\1)[一-龠ぁ-ゔァ-ヴーa-zA-Z0-9々〆〤、。0-9]+\1
Regex demo on regex101
Note using a character class ["'] is more efficient than an alternation; and I’ve modified your regex to insist that the closing quote matches the opening quote by replacing ("|') at the end with \1.