Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to read text off a website using python (Simple explanation)

I’m looking to make a program that can get the text off a website when given the website’s URL. I would like to be able to get all text between the

tags. Everywhere I have looked online seems to overcomplicate this and it involves some coding in C which I am not well versed in. To summarize what I would like the code to look like (best case scenario). If theres anything I can clarify or is unclear in the question please let me know in comments

import WebReader as WR

StringOfWebText = WR.getParagrahText("WebsiteURL")

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You probably want to look into something like BeautifulSoup paired with requests. You can then extract text from a page with a simple solution like this:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://google.com")
soup = BeautifulSoup(r.text, "html.parser")
print(s.text)

There’s also tag-searching and other useful features built into BS4, if you need to be able to handle that.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading