I only know how to match one character 0 or 1 time in regex, for example
content = "abc"
print(re.match(r'abc?', content)) #true
content = "ab"
print(re.match(r'abc?', content)) #true
Now there are two actual situations
content = "民国4年(1915年)2至3月" #include parentheses
#content = "民国4年2至3月" #not include
print(re.match(r'.*年(\(.{1,5}\))?', content).group())
The problem is the actual result is 民国4年(1915年 I don’t know why it missing the right parentheses.
>Solution :
.*年 is greedy and matches 民国4年(1915年 all by itself by matching everything up to the last 年. With the trailing ? in (\(.{1,5}\))? it makes matching the string (1915年) optional, so the final result is only what was matched by .*年.
Make .*年 non-greedy by using .*?年 and it will only match up to the first 年:
import re
content1 = "民国4年(1915年)2至3月" #include parentheses
content2 = "民国4年(2至3月" # not include
print(re.match(r'.*?年(\(.{1,5}\))?', content1).group())
print(re.match(r'.*?年(\(.{1,5}\))?', content2).group())
Output:
民国4年(1915年)
民国4年