How to Extract Full Sentence from Text Including Newline(s) Using Python?

Advertisements

I am trying to get text from PDF file. However, I cannot get the entire sentence that I want. I think below example clearly explains my problem.

Text I want to take:

Start Date: 01/01/2023
Name Surname: 123456789 Fernando Alonso
Salary: 1.915,15$
Address: SPAIN, MADRID, 12345, FORMULA, RENAULT BLV. ASTON MARTIN STREET\n – A5 BLOK NO:112 D:48

I’m taking each variable separately. I am using the specific versions of the below code for each variable:

address = re.findall(r'Address : (.+)', example_text)

I want to get the whole address but after running the code I’m getting below answer which is the address until \n:

SPAIN, MADRID, 12345, FORMULA, RENAULT BLV. ASTON MARTIN STREET

How can I solve this problem?

>Solution :

Use re.S modifier or flag:

import re

example_text = """
Start Date: 01/01/2023
Name Surname: 123456789 Fernando Alonso
Salary: 1.915,15$
Address: SPAIN, MADRID, 12345, FORMULA, RENAULT BLV. ASTON MARTIN STREET\n - A5 BLOK NO:112 D:48
"""

address = re.findall(r'Address:\s+(.+)', example_text, re.S)
print(address)
# ['SPAIN, MADRID, 12345, FORMULA, RENAULT BLV. ASTON MARTIN STREET\n - A5 BLOK NO:112 D:48\n']

From the docs:

re.S
re.DOTALL
Make the '.' special character match any character at
all, including a newline; without this flag, '.' will match anything
except a newline. Corresponds to the inline flag (?s).

Leave a ReplyCancel reply