Would need help on how to extract the multiple passport numbers matching after a passport keyword using a regex’s
Text:
my friends passport numbers are V123456, V123457 and V123458
Regex:
(?<=passport)\s*(?:\w+\s){0,10}\s*(\b[a-zA-Z]{0,2}\d{6,12}[a-zA-Z]{0,2}\b)
Expected matches output:
V123456
V123457
V123458
Actual output:
V123456
>Solution :
You can’t rely on a lookbehind here since you would need a pattern of an indefinite length. It is supported, but only in recent Java versions.
You may use a pattern based on the \G operator:
(?:\G(?!\A)|\bpassport\b).*?\b([a-zA-Z]{0,2}\d{6,12}[a-zA-Z]{0,2})\b
See the regex demo. Pattern details:
(?:\G(?!\A)|\bpassport\b)– either a whole word passport (\bpassport\b) or (|) the end of the previous successful match (\G(?!\A)).*?– any zero or more chars as few as possible (since the pattern is compiled withPattern.DOTALL, the.can match any characters including line break characters)\b([a-zA-Z]{0,2}\d{6,12}[a-zA-Z]{0,2})\b– a whole word that starts with zero, one or two ASCII letters, then has six to 12 digits and ends with zero, one or two ASCII letters.
See the Java demo below:
String s = "my friends passport numbers are V123456, V123457 and V123458";
String rx = "(?:\\G(?!^)|\\bpassport\\b).*?\\b([a-zA-Z]{0,2}\\d{6,12}[a-zA-Z]{0,2})\\b";
Pattern pattern = Pattern.compile(rx, Pattern.DOTALL);
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
Output:
V123456
V123457
V123458