My script reads two files as input: articles.txt and article-titles.txt
articles.txt contains articles that are delimited with "<<<" without quotes.
While article-titles.txt contain a list of titles delimited "\n" without the quotes. The last one may or may not be delimited with a \n
articles.txt:
This is article.txt. This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.
This is article.txt.This is article.txt. This is article.txt. This is article.txt.
>>>
This is article.txt.This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.
>>>
This is article.txt. This is article.txt. This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.This is article.txt.
article-title.txt:
This is the filename of the first article
This is the filename of the second article
This is the filename of the third article
My script should split the articles in articles.txt into separate text files.
Name each file according to each line on article-title.txt.
Fill each character space in the filename with a dash "-"
Filenames should end with a .txt
Therefore a successful execution of the script should have three files or whatever number of files required and one file will be named: This-is-the-filename-of-the-first-article.txt
At the moment my scrip outputs a single file
with open("inputfile.txt", "r") as f1, open("inputfile-title.txt", "r") as f2:
buff = []
i = 1
for line1, line2 in zip(f1, f2):
x = 0
if line1.strip():
buff.append(line1)
if line1.strip() == ">>>":
data = f2.readlines()
output = open('%s.txt' % data[x].replace('\n', ''),'w')
output.write(''.join(buff))
output.close()
x+=1
print("This is x:", x)
print("This is data:", data)
buff = [] #buffer reset
>Solution :
The immediate flaw is that you read in all the article names the first time you see a delimiter, and then further attempts to read from the same file handle will no longer work. See also Why can't I call read() twice on an open file?
For efficiency and elegance, I would also refactor to simply read and write one line at a time.
with open("articles.txt", "r") as text, open("article-title.txt", "r") as titles:
for line in titles:
filename = line.rstrip('\n').replace(' ', '-') + '.txt'
with open(filename, 'w') as article:
for line in text:
if line.strip() == '>>>':
break
article.write(line)
This will obviously not work correctly if the number of file names is less than the number of sections in the input file. Conversely, if there are too many file names, the excess ones will not be used. Perhaps a better design would be to inline the file names into the data, or perhaps devise a mechanism for generating fallback file names if there are not enough of them in the input.
Demo: https://ideone.com/hYnOzP