Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Can't get text with Beautifull Soup from between <p> </p>

import requests 
from bs4 import BeautifulSoup

URL = "https://habr.com/ru/hubs/gamedev/articles/" # Url to website

page = requests.get(URL).content
soup = BeautifulSoup(page, "html.parser")
post = soup.find("article", class_="tm-articles-list__item") # Last post thah i need to parse 

discription = post.find_all('p')
for post_text in discription:       # Trying to separate the text 
    text = post_text.get_text()

print(text)

Getting this error:
File "d:\CODING\Projects\net N FV.py", line 14, in
print(text)
^^^^
NameError: name ‘text’ is not defined.
Or
text that i dont need

On a website post’s html code, that im parsing, looks like this:

<div class="article-formatted-body article-formatted-body article-formatted-body_version-2"> 
<p> 
"Сегодня первой игре из серии DOOM исполняется ровно 30 лет! Мы не могли обойти стороной это событие и в честь этого решили посмотреть, как же выглядит код этой легендарной игры спустя годы."
 </p>
<p></p> 
after:: 
</div>

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The text you see on the page is stored inside <script> element. So to parse it you can use next example:

import re
import json

import requests
from bs4 import BeautifulSoup

URL = "https://habr.com/ru/hubs/gamedev/articles/"  # Url to website

page = requests.get(URL).text
data = re.search(r"window\.__INITIAL_STATE__=(.*}});", page).group(1)

data = json.loads(data)

for a in sorted(
    data["articlesList"]["articlesList"].values(),
    key=lambda k: k["timePublished"],
    reverse=True,
):
    print(a["titleHtml"])
    print(BeautifulSoup(a["leadData"]["textHtml"], "html.parser").text)

    # we want just first article
    break

Prints:

30 лет DOOM: новый код — новые баги
Сегодня первой игре из серии DOOM исполняется ровно 30 лет! Мы не могли обойти стороной это событие и в честь этого решили посмотреть, как же выглядит код этой легендарной игры спустя годы.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading