I have used this regex to match a valid sentence:
It serves the purpose to validate a sentence – (by definition)
- starts with Capital letter(s)
- no punctuations in the middle of sentence, and
- it ends with one or more of the punctuations. (eg. ?!)
(because the 2nd part is very similar with 3rd part, except negation)
The question I have – is there a way to make it more concise?
/[A-Z]+[^\.?!][\.?!]+/
# will match all the sentences:
s1 = "A boy likes to read books."
s2 = "ABBA is one of the best musical bands."
s3 = "Are you kidding me?!"
# but not these sentences:
s4 = "City leaders allocated the first half of the ARPA funding to a series of initiatives,"
s5 = "will you abide the new law?"
>Solution :
Here are a few way:
- You don’t need to escape the dot in character classes, so
[\.]
is the same as[.]
- You only need 1 capital letter at the start (can remove the plus)
- You can use reluctant quantifier
.*?
(as little as possible)
This should work:
[A-Z].*?[.?!]+
Consider removing the +
, unless you really need to match sentences that end with ??
or ?!
or !!!
etc.