Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

simplify a string excluding an array

I want to remove every extra spaces, signs, and lowercase ( in another words I want to simplify) the string with a function. The following function does this perfectly:

console.log(simplify('   The     very optiMal! FUNCTION, {here] ...'));

function simplify(string) {
    return string.toLowerCase().replace(/[^A-Za-z0-9'_]+/g, " ").trim();
}

But the issue is I want to exclude an array of signs not to be removed from the string:

const signs = ['!!', '?!', '!?', '...', '..', '.', '?', '؟!', '!؟', '!', '؟', ':'];

So if the is any of the above signs in the string it should be intact and not be removed.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How would you do this?

>Solution :

You can use

const signs = ['!!', '?!', '!?', '...', '..', '.', '?', '؟!', '!؟', '!', '؟', ':'];
const exclusion = signs.map(x => x.replace(/[-\\^$*+?.()|[\]{}]/g, '\\$&')).join("|");
const regex = new RegExp("(" + exclusion + ")|(?:(?!" + exclusion +")[^A-Za-z0-9'_])+", "g");

function simplify(string) {
    return string.toLowerCase().replace(regex, (x,y) => y || " ").trim();
}

console.log(simplify('   The     very optiMal! FUNCTION, {here] ...'));

Details:

  • (<exclusion>) – Group 1 with exclusion patterns
  • | – or
  • (?:(?!<exclusion>)[^A-Za-z0-9'_])+ – a char other than an ASCII alphanumeric, underscore or ' chars, one or more but as many as possible occurrences, that does not start any of the exclusion patterns (since some of them are multi-character they cannot be simply included to the original negated character class).

The replacement is the Group 1 contents if Group 1 matches, else, the replacement is a space.

Another way to approach the issue – in case you want to always have a space separating each substring – is to use a reverse approach: match what you need and then join with a space:

const signs = ['!!', '?!', '!?', '...', '..', '.', '?', '؟!', '!؟', '!', '؟', ':'];
const exclusion = signs.map(x => x.replace(/[-\\^$*+?.()|[\]{}]/g, '\\$&')).join("|");
const regex = new RegExp(exclusion + "|[A-Za-z0-9'_]+", "g");

function simplify(string) {
    return string.toLowerCase().match(regex).join(" ");
}

console.log(simplify('this is...'));
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading