Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Take text file and create csv file

I have a larger Python 3 program that processes OCR outputs and some bubble detection and I have it mostly worked out. I have one function that I got off Stack Overflow that works but has a weird side effect and since I do not understand the code very well I would like to get a little help coming up with something that works as I would like.

Here is the code I am using now:
Link

How it works:
I have a text file we can call address.txt that looks like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

First Name,
Address,
City State Zip,
Second Name,
Second Address,
Second City State zip,

I would like to convert that to this:

First Name, Address, City State Zip,
Second Name, Second Address, City State Zip,

Ideally I would have it write to address.txt in the format I want to start, rather then create the file and have to edit the file afterwards using the above function I picked up from stack overflow. Here is my function that reads the images creates the file and adds commas at the end of each line.
If I could get it to line up every three lines in one line I would not need the above code at all.

def tess_address():
    files = os.listdir("address")
    sorted_files = sorted(files)
    for image in sorted_files:
        # read image
        output = "address/" + image
        # Pass the image through pytesseract
        text = pytesseract.image_to_string(output)
        #remove all commas
        no_comma_text = re.sub(",", "", text)
        for line in no_comma_text.splitlines():
            #print to file
            print(line + ",", file=open("address" + '.txt', 'a', encoding='utf8'))

>Solution :

Since i don’t have the address.csv file,i could only think through this thought process. To modify your tess_address function so it directly formats the OCR output into the desired CSV format without needing a separate step to edit the file, you can adjust the loop that processes each line. Instead of appending each line with a comma and writing it directly to the file, you can accumulate lines in groups of three and then write each group as a single line in the CSV file like this below.

import os
import pytesseract
import re

def tess_address():
    # Ensure the output directory exists
    output_dir = "address"
    os.makedirs(output_dir, exist_ok=True)

    files = os.listdir(output_dir)
    sorted_files = sorted(files)
    output_file_path = os.path.join(output_dir, 'addresses.csv')

    with open(output_file_path, 'w', encoding='utf8') as output_file:
        for image in sorted_files:
            # Construct the full path for the image
            image_path = os.path.join(output_dir, image)

            # Pass the image through pytesseract
            text = pytesseract.image_to_string(image_path)

            # Remove all commas from the OCR output
            no_comma_text = re.sub(",", "", text)

            # Initialize a list to accumulate lines
            accumulated_lines = []

            for line in no_comma_text.splitlines():
                accumulated_lines.append(line)
                # Once we have three lines accumulated, write them as a single line in the CSV
                if len(accumulated_lines) == 3:
                    # Join the three lines with commas, add a trailing comma, and write to the file
                    output_file.write(', '.join(accumulated_lines) + ',\n')
                    # Reset the accumulator for the next group of lines
                    accumulated_lines = []

            # Handle any remaining lines in case the total number is not a multiple of three
            if accumulated_lines:
                output_file.write(', '.join(accumulated_lines) + ',\n')


Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading