I have some txt files in E:\Desktop\prog\OCR directory that each file have a format like following:
Fytytyotyrtyttyran
57.338
CtyOtyBtyOtyL
13.318
AytLtGtyOtyL
10.254
Ayttssemtybtyly
5.33
BtyAtySItyC
2.061
AytryPL
1.53
Lirtysyrtyp
1.466
Ctry
0
Patretsyttrcal
0
1965 Q2
Now i want to convert above list to following format:
Fytytyotyrtyttyran;57.338
CtyOtyBtyOtyL;13.318
AytLtGtyOtyL;10.254
Ayttssemtybtyly;5.33
BtyAtySItyC;2.061
AytryPL;1.53
Lirtysyrtyp;1.466
Ctry;0
Patretsyttrcal;0
1965 Q2
note that last line of each file no need any change.
I wrote following python script for this:
import os
input_directory = r'E:\Desktop\prog\OCR'
output_directory = r'E:\Desktop\prog\OCR\output'
def merge_even_odd_lines(input_path, output_path):
with open(input_path, 'r', encoding='utf-8') as infile:
lines = infile.readlines()
merged_lines = []
for i in range(0, len(lines), 2):
if i + 1 < len(lines):
odd_line = lines[i].strip()
even_line = lines[i + 1].strip()
merged_lines.append(f"{odd_line};{even_line}")
else:
merged_lines.append(lines[i].strip())
with open(output_path, 'w', encoding='utf-8') as outfile:
outfile.write('\n'.join(merged_lines))
def process_files(directory_path):
if not os.path.exists(output_directory):
os.makedirs(output_directory)
for root, _, files in os.walk(directory_path):
for file in files:
if file.endswith('.txt'):
input_file_path = os.path.join(root, file)
output_file_path = os.path.join(output_directory, file)
merge_even_odd_lines(input_file_path, output_file_path)
if __name__ == "__main__":
process_files(input_directory)
print("Conversion completed successfully.")
But my script convert my files to following format:
Fytytyotyrtyttyran;57.338;CtyOtyBtyOtyL;13.318
AytLtGtyOtyL;10.254;Ayttssemtybtyly;5.33
BtyAtySItyC;2.061;AytryPL;1.53
Lirtysyrtyp;1.466;Ctry;0
Patretsyttrcal;0;1965 Q2
where is my script problem?
>Solution :
The problem is that you’re processing the output files as input files, because the output directory is a subdirectory of the input directory, and os.path.walk() goes into subdirectories. So each file gets merged twice.
If you don’t need to process the directory hierarchy recursively, don’t use os.path.walk(), just loop over the files in input_directory:
for file in glob.glob(os.path.join(input_directory, "*.txt")):
If you do need to recurse, the simplest solution is to move the output directory out of the input directory. Another choice is to check whether root is in the output directory and skip those files.