Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Webscraper Python Beautifulsoup

I know what I’m trying to do is the simplest but it blows my mind. I’d like pull data from HTML page (https://partner.microsoft.com/en-us/membership/application-development-competency) using BeautifulSoup. To do that I need to use the .find() function I guess. Ain’t know what to do no more. Appreciate every form of help.
Here’s the HTML I’m working with:
[enter image description here][1]
[1]: https://i.stack.imgur.com/sHAMF.png

import requests
from bs4 import BeautifulSoup 

url = 'https://partner.microsoft.com/en-us/membership/application-development-competency'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find("div",{"class":"col-md4[2]"})

output = ''
blacklist = [
    'style',
    'head',
    'meta',
    'col-md4[0]',
    'col-md4[1]',
]

for t in text:
    if t.parent.name not in blacklist:
        output += '{} '.format(t)

    sheet = '<html><body>' + text + '</body></html>';
    file_object  = open("record.html", "w+");
    file_object.write(sheet);
    file_object.close();

[enter image description here][1]
[1]: https://i.stack.imgur.com/sHAMF.png

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can do something like this, but for this site i think better use XPath

import requests
from bs4 import BeautifulSoup


url = 'https://partner.microsoft.com/en-us/membership/application-development-competency'
response = requests.get(url)
soup = BeautifulSoup(response.text, features="lxml")
cols = soup.find_all("div", class_="col-md-4")
print(cols[6].getText())
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading