include unicode character within long regex

July 15, 2022

I have a regex:

/[a-zA-Zɑôáīúȑìêɑ͡iɑ͡uŋġḧn̐ƞġg̶̃čḣñt́d́ŕŕńȶv̈m̈ᵯǰɏæǽÿẇẏs̃śś̶]+/gm

which works great except there is one character I can’t include (or that doesn’t seem to work as expected when included). The character is (within) the last digit of the regex:

ś̶ // [it makes the cross-through (not easily visible in some fonts), in unicode it is 'COMBINING LONG STROKE OVERLAY' (U+0336)]

my regex is capturing the character but splitting any word that contains it:

"mokk̇ś̶ḣô".match(/[a-zA-Zɑôáīúȑìêɑ͡iɑ͡uŋġḧn̐ƞġčḣñt́d́ŕŕńȶv̈m̈ᵯǰɏæǽÿẇẏs̃śś̶g̶̃]+/gm)

// == ['mokk', 'ś̶ḣô']

I’ve heard about Unicode Property Escapes using \p{UnicodePropertyValue} with a u flag. Would that be useful here?

>Solution :

It doesn’t seem to be related to ś char. As you said your self, it’s being captured.
The reason for the splitting is the lack of another char: k̇.

console.log("mokk̇ś̶ḣô".match(/[a-zA-Zɑôáīúȑìêɑ͡iɑ͡uŋġḧn̐ƞġčḣñt́d́ŕŕńȶv̈m̈ᵯǰɏæǽÿẇẏs̃śś̶g̶̃]+/gm)
)
console.log("mokk̇ś̶ḣô".match(/[a-zA-Zɑôáīúȑìêɑ͡iɑ͡uŋġḧn̐ƞġčḣñt́d́ŕŕńȶv̈m̈ᵯǰɏæǽÿẇẏs̃śś̶k̇g̶̃]+/gm)
)