Match numbers with regex only if specified characters are also present

May 17, 2024

I need to find Japanese characters in a file that are enclosed with quotation marks. In some cases, the resulting phrase also has half-width numbers, but not always. I would like to write a regular expression that matches phrases that have Japanese characters, either with or without half-width numbers, but not phrases that only have half-width numbers.

Test contents:

"文章表示"
"1文章表示"
"文章表示1"
"文章1表示"
"1"

I would like to match the first four examples, but not the fifth one.

My current regular expressions:

/("|')[一-龠ぁ-ゔァ-ヴーａ-ｚＡ-Ｚ０-９々〆〤、。]+("|')/
//Matches the first, but not the second - fourth

/("|')[一-龠ぁ-ゔァ-ヴーａ-ｚＡ-Ｚ０-９々〆〤、。0-9]+("|')/
//Matches all

/("|')[一-龠ぁ-ゔァ-ヴーａ-ｚＡ-Ｚ０-９々〆〤、。]+[0-9]*("|')/
/("|')[0-9]*[一-龠ぁ-ゔァ-ヴーａ-ｚＡ-Ｚ０-９々〆〤、。]+("|')/
//Matches the first and either the second or the third

/("|')[0-9]*[一-龠ぁ-ゔァ-ヴーａ-ｚＡ-Ｚ０-９々〆〤、。]+[0-9]*[一-龠ぁ-ゔァ-ヴーａ-ｚＡ-Ｚ０-９々〆〤、。]*[0-9]*("|')/ 
//Matches the first through fourth, but is very inefficent

I am looking for an optimal way to write the logic in the last regular expression.

>Solution :

You can modify your second regex with a negative lookahead for a string which only has half-width numbers before the closing quote:

(["'])(?!\d+\1)[一-龠ぁ-ゔァ-ヴーａ-ｚＡ-Ｚ０-９々〆〤、。0-9]+\1

Regex demo on regex101

Note using a character class ["'] is more efficient than an alternation; and I’ve modified your regex to insist that the closing quote matches the opening quote by replacing ("|') at the end with \1.