Paginating pages using things other than numbers in python

I am trying to paginate a scraper on my my university’s website.
Here is the url for one of the pages:

where david-abel is a first followed by last name. (It would be first-middle-last if a middle was given which poses a problem based on my code only finding first and last currently). I have a plan to deal with middle names but my question is:

How do I go about adding names from my first and lastnames list to my base url to get a corresponding url in the layout above

import requests
from bs4 import BeautifulSoup

url = ''
data = requests.get(url)

my_data = []
split_names = []
firstnames = []
lastnames = []
middlenames = []

html = BeautifulSoup(data.text, 'html.parser')

professors ='h4.profile-card__name')

for professor in professors:

for name in my_data:
    x = name.split()

for name in split_names:
    f, l = zip(*split_names)

#\/ appending searchable url using names
for name in split_names:
    baseurl = ""
    newurl = baseurl + 


>Solution :

Using your method and getting the name splitting then adding "-" between the first-middle-last names can work if you’re sure that the profile link will be that way.

you should extract the URL (href) instead from the a tag directly

import requests
from bs4 import BeautifulSoup

# define an empty list to save data on
df_list = []

# go through pages from 1 to 7
for page in range(1, 8):

    # define the current page url
    url = '' + str(page)

    # make request to the current page
    response = requests.get(url)

    # parse the html into a soup object
    soup = BeautifulSoup(response.text, 'html.parser')

    # select all the professors 
    professors ='a.profile-card__link')

    # go through every professor 
    for professor in professors:

        # get the name by the class "profile-card__name"
        name = professor.select_one('.profile-card__name').text
        # get the name by the class "profile-card__title"
        title = professor.select_one('.profile-card__title').text
        # get the "href" of the "a" tag (the profile link)
        link = professor.get('href')

        # add the data into the list
        df_list.append((name, title, link,))


Leave a Reply