Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use text lines in one file as filename for others

My script reads two files as input: articles.txt and article-titles.txt

articles.txt contains articles that are delimited with "<<<" without quotes.
While article-titles.txt contain a list of titles delimited "\n" without the quotes. The last one may or may not be delimited with a \n

articles.txt:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is article.txt. This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.
This is article.txt.This is article.txt. This is article.txt. This is article.txt.
>>>

This is article.txt.This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.
>>>

This is article.txt. This is article.txt. This is article.txt. This is article.txt.
This is article.txt. This is article.txt. This is article.txt.This is article.txt.

article-title.txt:

This is the filename of the first article
This is the filename of the second article
This is the filename of the third article

My script should split the articles in articles.txt into separate text files.
Name each file according to each line on article-title.txt.
Fill each character space in the filename with a dash "-"
Filenames should end with a .txt

Therefore a successful execution of the script should have three files or whatever number of files required and one file will be named: This-is-the-filename-of-the-first-article.txt

At the moment my scrip outputs a single file

with open("inputfile.txt", "r") as f1, open("inputfile-title.txt", "r") as f2:
    buff = []
    i = 1
    for line1, line2 in zip(f1, f2):
        x = 0
        if line1.strip():
           buff.append(line1)
        if line1.strip() == ">>>":
           data = f2.readlines()
           output = open('%s.txt' % data[x].replace('\n', ''),'w')
           output.write(''.join(buff))
           output.close()
           x+=1
           print("This is x:", x)
           print("This is data:", data)
           buff = [] #buffer reset

>Solution :

The immediate flaw is that you read in all the article names the first time you see a delimiter, and then further attempts to read from the same file handle will no longer work. See also Why can't I call read() twice on an open file?

For efficiency and elegance, I would also refactor to simply read and write one line at a time.

with open("articles.txt", "r") as text, open("article-title.txt", "r") as titles:
    for line in titles:
        filename = line.rstrip('\n').replace(' ', '-') + '.txt'
        with open(filename, 'w') as article:
            for line in text:
                if line.strip() == '>>>':
                    break
                article.write(line)

This will obviously not work correctly if the number of file names is less than the number of sections in the input file. Conversely, if there are too many file names, the excess ones will not be used. Perhaps a better design would be to inline the file names into the data, or perhaps devise a mechanism for generating fallback file names if there are not enough of them in the input.

Demo: https://ideone.com/hYnOzP

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading