Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove and write new line while scraping

I made small python scraper which should find emails from loaded .txt file,in where are line by line url links to scrape.

Iam trying to write them to the another .text file but somehow I’m not able to write new line for each new scraped link loaded from .txt file and also there are some disturbing [' n\ this kind of characters which keep writting to the text file even they are not presented in url I scraping.

My scraper:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def scrapeEmails():
    global reqs, _lock, success, fails, rps, rpm

    with open(os.path.join("proxies.txt"), "r") as f:
        proxies = f.read().splitlines()
    with open(os.path.join("links_toscrape.txt"), "r") as f:
        channelLinks = f.read().splitlines()
    rndChannelLinks = random.choice(channelLinks)

    URL = rndChannelLinks + "/about"
    proxy = random.choice(proxies)
    proxies = {"https": "http://"+proxy}

    soup = BeautifulSoup(requests.get(URL, proxies=proxies).text, "html.parser") 
    _description = soup.find("meta", property="og:description")
    _content = _description["content"] if _description else "No meta title given"

    #for s in _content:
    if "@" in _content.lower():
        __email = re.findall("([\s]{0,10}[\w.]{1,63}@[\w.]{1,63}[\s]{0,10})", _content)
        cleanEmail = [x.replace("\n", "") for x in __email]
        print("Email: ",  cleanEmail)

        with open("scraped_emails.txt") as f:
            f.write(str(cleanEmail))
            f.close()
    else:
        print("Email of YouTube channel " + URL + " not found.")  

>Solution :

First of all, we can not try this code and see what’s wrong because you did not share any example. So I will try to figure out by guessing.

  1. In the current code, the file is opened and closed on each iteration of the loop, Instead, open the file before the loop and close it after the loop is completed. Otherwise it will slow down to your code.

  2. Use the ‘a’ mode when opening the file, instead of ‘w'(it’s default), to append the new scraped emails to the file, rather than overwriting the existing contents.

  3. You are calling f.close() which is not needed because you are using with open statement which automatically closed the file once you exit the block.

As a result, your when you write your file, use this code:

with open("scraped_emails.txt", 'a') as f:
    for email in cleanEmail:
        f.write(email + '\n')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading