I’m trying to extract a 9-digit-reference from a text. This reference always starts with a 2 or a 5.
Example: Hello. My reference is 233445566.
Output: 233445566
I’ve been using the expression ([2,5][0-9]\w{7,7}) and it works. However if the sentence is "Hello. My phone number is 6233445566." the output is also ‘233445566’, and I don’t want that. In this scenario, the expression shouldn’t return anything.
Any idea of how I can avoid this problem?
Thanks!
>Solution :
You can use a word boundary to make sure that the matched reference is not part of a larger number. A word boundary \b matches the position between a word character (as defined by \w) and a non-word character (as defined by \W), or between the start/end of the string and a word character or a non-word character.
Here is the modified regular expression that includes the word boundary \b at the beginning and end: \b([25][0-9]\w{7})\b
This regular expression matches a string that starts with 2 or 5, followed by seven word characters, and ends at a word boundary. The length of the reference is exactly 9 characters, as required.