Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I scrape the name/text of the hyperlinks using python?

I wanted to extract the name of the links from this URL https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html however, I can’t move on to the next step. below is my code so far

import requests as re
from bs4 import BeautifulSoup

URL = "https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html"
page = re.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(class_="post altr")

for result in results:
    print(result)

I still don’t know how to go to the next step. Any help is very much appreciated. Thank you.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

This code gets every text of a link in the page:

import requests as re
from bs4 import BeautifulSoup

URL = "https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html"
page = re.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all('a')

for result in results:
    print(result.text.strip())

Output:

CCDA
port channels
RPVST
Dynamic Trunking Protocol
VTP transparent mode
Layer 3 load balancing
user ports
enable PortFast
the core layer
link redundancy
access layer switches
Gateway Load Balancing Protocol
core switches
distribution switches
redundant paths
campus core
Large Building LANs
LAN Design Types and Models
Shutting Down a BGP Neighbor
Core Layer Functionality - Network Design
Distribution Layer Functionality
Characterizing Types of Traffic Flow for New Network Applications
DHCP Starvation and Spoofing Attacks
How to Start an Ecommerce Business
Reply
About
Contact
Advertise
Privacy Policy
Resources

It works because in order to create a hyperlink in html the tag <a> is used. I believe you’re asking for the blocks of text that happen to have have a hyperlink, but if you’re asking for the links, here’s how you can do it:

import requests as re
from bs4 import BeautifulSoup

URL = "https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html"
page = re.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

for a in soup.find_all('a', href=True):
    print(a['href'])

Output:

/
/reviews/traffic-xtractor.html



/ccda/
/routing-switching/using-routed-ports-and-portchannels-with-mls.html
/root-bridge/rapid-pervlan-spanning-tree-protocol.html
/network-security-2/dynamic-trunking-protocol-dtp.html
/root-bridge/vtp-modes.html
/root-bridge/configuring-etherchannel-load-balancing.html
/routing-switching-2/switch-security-best-practices-for-unused-and-user-ports.html
/global-configuration/enabling-bpdu-guard.html
/network-design/core-layer-functionality.html
/network-design/designing-link-redundancy.html
/network-design/access-layer-functionality.html
/root-bridge/gateway-load-balancing-protocol.html
/switching/collapsed-core.html
/switching/distribution-layer-switches.html
/switching/backbonefast-redundant-backbone-paths.html
/network-design/campus-core-design-considerations.html
/ccda/largebuilding-lans.html
/ccda/lan-design-types-and-models.html
/cisco-internetworks-2/shutting-down-a-bgp-neighbor.html
/network-design/core-layer-functionality.html
/network-design/distribution-layer-functionality.html
/network-design-2/characterizing-types-of-traffic-flow-for-new-network-applications.html
/snrs-3/dhcp-starvation-and-spoofing-attacks.html
/ecommerce.html
/about/
/contact/
/advertise-with-us/
/privacy-policy/
/resources/

This scrapes just the ‘href’ of each tag.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading