Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get every email of random pages

I want to get every email address of 1000 webpages using Python’s Selenium.

My idea:

go to page x

a = driver.page_source

get the text of a that contains @

But however I cant get that part from a.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can get a list of the links this way:

links = [elem.get_attribute('href') for elem in elems]

where elems is a driver.find_elements_by_...() returned value, for example:

elem = driver.find_elements_by_css_selector('a') # You need <a> tags if you want to be sure to find href attribute

You can check if it’s an email this way:

def isMail(link: str):
    if ('mailto:' in link):
        return True
    return False

So

mails = [link.removeprefix('mailto:') for link in links if isMail(link)]

I would suggest to read also this and this.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading