Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Selenium – href specific output

I have been working on a project to pull specific href links from a site. Everything is working but I also want to be able to pull only specific data from href link. I haven’t been able to figure that part out.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service


service = Service(executable_path="c:/temp/chromedriver.exe")
driver = webdriver.Chrome(service=service)

#Source site
driver.get("site.com")

#The xpath to the links
s = driver.find_element(By.XPATH, "/html/body/div[3]/div[4]/table[1]/tbody/tr[2]/td[4]/a").get_attribute('href')
smonth = driver.find_element(By.XPATH, "/html/body/div[3]/div[4]/table[1]/tbody/tr[2]/td[3]")

b = driver.find_element(By.XPATH, "/html/body/div[3]/div[4]/table[1]/tbody/tr[6]/td[4]/a").get_attribute('href')
bmonth = driver.find_element(By.XPATH, "/html/body/div[3]/div[4]/table[1]/tbody/tr[6]/td[3]")

bb = driver.find_element(By.XPATH, "/html/body/div[3]/div[4]/table[1]/tbody/tr[28]/td[3]/a").get_attribute('href')
bbmonth = driver.find_element(By.XPATH, "/html/body/div[3]/div[4]/table[1]/tbody/tr[28]/td[2]")

#The output of the links
print("S:", smonth.text, s, "\nB:", bmonth.text, b, "\nBB:", bbmonth.text, bb)


driver.close()

Here is the output

S: July 2022 https://bdn-ak-ssl.site.com/software/s96_8_80.exe
B: July 2022 https://bdn-ak-ssl.site.com/software/b43_6_56.exe
BB: July 2022 https://bdn-ak-ssl.site.com/software/bb202_2_100.exe

I’m trying to get the output to look like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

S: July 2022 https://bdn-ak-ssl.site.com/software/s96_8_80.exe Version: s96_8_80
B: July 2022 https://bdn-ak-ssl.site.com/software/b43_6_56.exe Version: b43_6_56
BB: July 2022 https://bdn-ak-ssl.site.com/software/bb202_2_100.exe Version: bb202_2_100

>Solution :

Use regex

import re

output = "BB: July 2022 https://bdn-ak-ssl.site.com/software/bb202_2_100.exe"
result = re.search(r'software/(.*?).exe', output)
result = result.group(1)

output += f" Version: {result}"
print(output)

Output

BB: July 2022 https://bdn-ak-ssl.site.com/software/bb202_2_100.exe Version: bb202_2_100
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading