I need regular extraction that extract passport number after specific word паспорт .
Possible options are:
паспорт 5715 424141паспорт 5715-424141паспорт 5715 - 424141
I need to extract first 4 and last 6 numbers after word паспорт occurred, so result should be 5715 and 424141.
I tried ^(\d{4})\ (\d{6})$ but it’s not detected my pattern.
>Solution :
For starters, the ^ symbol means the start of the string, so that already fails your pattern (as the strings start with "паспорт").
It also seems that the - between the number groups is optional and may have spaces which you don’t support.
To fix all those issues, use:
паспорт (\d{4})\s*-?\s*(\d{6})
паспорт– literal match.(\d{4})– a capture group of four digits.\s*– any number of spaces, including 0.-?– an optional dash.\s*– any number of spaces, including 0.(\d{6})– a capture group of six digits.
And since you tagged with Python:
import re
s = """паспорт 5715 424141
паспорт 5715-424141
паспорт 5715 - 424141"""
for line in s.splitlines():
print(re.search(r"паспорт (\d{4})\s*-?\s*(\d{6})", line).groups())
# ('5715', '424141')