I have 4 folders full of txt documents. I used the code below to extract all the txt and append them to a list.
Doc1 = glob.glob('path*.txt')
Doc2 = glob.glob('path*.txt')
Doc3 = glob.glob('path*.txt')
Doc4 = glob.glob('path*.txt')
lines = []
for file in Doc1: ### repeated this block for Doc2, Doc3 and Doc4 ####
f = open(file,'r')
lines.append(f.readlines())
f.close()
This code above worked just fine. However, now what I want to do is:
- for each txt document in the folder, I only want to append the lines between a start and an end, to get rid of unnecessary text. The start and end text will be the same for every document in all the folders. I tried to do this:
for file in Doc1:
f = open(file,'r') #this opens the file
for line in file: #for each line in the file
tag = False #tag set to False, initially
if line.startswith('text'): #if it starts with this text then:
tag = True #tag changes to True
elif 'end text' in line: #if it this text is in the line then:
tag = False #tag stays false
lines.append(f.readlines()) #append this line (but now tag stays False)
elif tag: # if tag is true, then append that line
lines.append(f.readlines())
f.close()
This code runs, as in, I do not get any warnings or errors. But no lines append to Lines. TIA for any advice and assistance.
>Solution :
The tag is reset on every iteration which means that it never outputs anything other than a line that has end text in it. Also, some reordering of the operations is needed.
lines = []
for p in Doc1:
tag = False # Set the initial tag to False for the new file.
with open(p, 'r') as f:
for line in f:
if tag: # First line will never be printed since it must be a start tag
lines.append(line)
if line.startswith('start'): # Start adding from next line
tag = True
elif 'end' in line: # Already added the line so we can reset the tag
tag = False