Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Searching and catching dictionary values at the txt files

I am stuck with extracting spesific data from txt file.

I have a txt file which includes some infomation.

E.g.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Company Name GmbH, Teststraße 24 , 01000 Sampleort
Customer Nr. 11111111
Invoice Nr. 22222

Invoice Adress
Company Name 2 mbH, Test2straße 11, 01001 Sample2ort
Order number. 555555
Order Date 01.01.1999

So, I have different structures of like above information. Some files include with Invoice Nr., some files include Inv. Number: 44444. I want to catch all of them. I think, I can catch all of these informations with creating dictionary like:

`values_dict= {'Customer Number':'customer nr.', 'customer number', 
'cus. nr.', ... , 'Order Number':'order number', 'Order nr' ,....}`

And, how can I catch spesific values from txt files with using that dict?

I am expecting output like:

Order Number : 555555
Customer Number: 11111111
Invoice Number: 22222
Order Date: 01.01.1999

Company Information: Company Name GmbH, Teststraße 24 , 01000 Sampleort
Invoice Company Information: Company Name 2 mbH, Test2straße 11, 01001 Sample2ort

>Solution :

The re module in Python provides support for regular expressions, allowing you to search, match, or split strings based on specified patterns. Python re module

import re

# Your dictionary of keys with their possible variations
values_dict = {
    'Customer Number': ['Customer Nr.', 'customer number', 'cus. nr.', 'Customer Number'],
    'Order Number': ['Order number.', 'Order nr', 'order number', 'Order Number'],
    'Invoice Number': ['Invoice Nr.', 'Inv. Number:', 'invoice number', 'Invoice Number'],
    'Order Date': ['Order Date'],
    'Company Information': ['Company Name'],
    'Invoice Company Information': ['Invoice Adress Company Name'] # Add more variations as needed
}

# Function to extract information based on the dictionary
def extract_information(text, values_dict):
    results = {}

    for key, variations in values_dict.items():
        for variation in variations:
            pattern = rf"{variation}[:]?[\s]*(.*)"
            match = re.search(pattern, text, re.IGNORECASE)
            if match:
                results[key] = match.group(1).strip()
                break 

    return results

text = """Company Name GmbH, Teststraße 24 , 01000 Sampleort 
Customer Nr. 11111111 
Invoice Nr. 22222

Invoice Adress Company Name 2 mbH, Test2straße 11, 01001 Sample2ort
Order number. 555555 
Order Date 01.01.1999"""

info = extract_information(text, values_dict)

for key, value in info.items():
    print(f"{key}: {value}")

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading