Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Scraping a dropdown menu to return name of items using BS4

I have the following XML. It is from a menu called Knives with sub-types like Bayonet, Classic Knife, etc.

<div class="group inline-block relative w-full lg:w-auto">
<button class="navbar-subitem-trigger text-left py-2 focus:outline-none hover:text-white w-full lg:w-auto block lg:inline-block lg:mr-2 xl:mr-4 text-blue-100" data-target="navbar-subitems-Knives" type="button">
Knives
</button>
<ul id="navbar-subitems-Knives" class="custom-scrollbar hidden bg-gray-700 rounded shadow-md text-blue-100 my-2 lg:my-0 overflow-hidden lg:overflow-y-auto lg:absolute lg:group-hover:block lg:max-h-[80vh]">
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/bayonet">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9iYXlvbmV0LjE3YmIyNWM3NTg2N2QwMzlmYTc1MjRlOGM1ZmE2MzEzNGI2MjQ1MzQucG5n/50/auto/85/notrim/8965bc48871767721ae4a2bd3762f460.webp" alt="Bayonet">
</div>
Bayonet
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/classic-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV9jc3MuMmZhNjZkMDEwMTMxYzA3NTMwYjA4ZTkwOTZlNGVmNGM4Y2NiODA4Ny5wbmc-/50/auto/85/notrim/8c546f8cb1f52b844f38fb4681c01dcb.webp" alt="Classic Knife">
</div>
Classic Knife
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/falchion-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV9mYWxjaGlvbi43MzM0OTBkY2Q0YjZiMTJmMTk1MTJiM2I5YTFhMDlkOTM1ZTZhYWVhLnBuZw--/50/auto/85/notrim/92b97f3404ee5c97178b6fa7bf45ab42.webp" alt="Falchion Knife">
</div>
Falchion Knife
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/flip-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV9mbGlwLjBlODhmNTZhODhlMTE1MjNhOGNhZGI2ZDcwMDNlZjMwOGM1MmZhYjkucG5n/50/auto/85/notrim/f831a3e8245eee182a4b3315c36f9df6.webp" alt="Flip Knife">
</div>
Flip Knife
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/gut-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV9ndXQuMWIwYTYyZjEwZDliYjcwZmRiMDY5NWU3MDI3NDI0ODZlNGNjZWJkZC5wbmc-/50/auto/85/notrim/78ce07820ddd2a888d6f20a2f39b46d0.webp" alt="Gut Knife">
</div>
Gut Knife
</a>
</li>
<li>
<a class="flex items-center outline outline-offset-0 outline-1 outline-gray-700 hover:bg-gray-600 hover:text-white py-2 px-4 whitespace-nowrap bg-gray-700" href="https://csgoskins.gg/weapons/huntsman-knife">
<div class="w-10 h-7 mr-1">
<img loading="lazy" class="lazy-instant max-w-full max-h-full" src="https://cdn.csgoskins.gg/public/uih/weapons/aHR0cHM6Ly9jZG4uY3Nnb3NraW5zLmdnL3B1YmxpYy9pbWFnZXMvYnVja2V0cy9lY29uL3dlYXBvbnMvYmFzZV93ZWFwb25zL3dlYXBvbl9rbmlmZV90YWN0aWNhbC42YzI0NTM3ZjVlMzA2NGNmNDA4MTNlOTNmOWZjYmFkYzk5MjA1Y2ExLnBuZw--/50/auto/85/notrim/3520c4123a62eb9550ccc7f3745eb53f.webp" alt="Huntsman Knife">
</div>
Huntsman Knife
</a>
</li>
</ul>
</div>

With the following code I try to get the names of the different sub-types:

import requests
from bs4 import BeautifulSoup
import pandas as pd

# URL of the webpage to scrape
url = 'https://csgoskins.gg/'  # Replace with the URL of the page you want to scrape

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
    }

# Send a GET request to the URL
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'lxml')
print(r)
knives_section = soup.find("ul",{"id":"navbar-subitems-Knives"}).findAll("w-10 h-7 mr-1")
print(knives_section)

but it returns nothing.
I tried to use elements from the following answer: Scraping from dropdown option value Python BeautifulSoup

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What am I doing wrong?

>Solution :

The issue with your code is how you are trying to find elements within the knives_section. The findAll method is not being utilized properly, that is, it won’t get the result you require. You are passing the class names "w-10 h-7 mr-1" as a single string, but these classes should be separated and passed as a list. Moreover, these classes belong to the div that contains the img tag, not the actual knife names. The knife names are the text contents of the a tags within the li elements.

Here’s how you can modify your code to correctly scrape the names of the different sub-types of knives:

import requests
from bs4 import BeautifulSoup

# URL of the webpage to scrape
url = 'https://csgoskins.gg/'  # Replace with the URL of the page you want to scrape

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
}

# Send a GET request to the URL
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')

# Find the knives section
knives_section = soup.find("ul", {"id": "navbar-subitems-Knives"})

# Find all knife names
knife_names = knives_section.find_all("li")
for knife in knife_names:
    # Extract and print the knife name
    name = knife.get_text(strip=True)
    print(name)

The above code will find the ul element with the ID navbar-subitems-Knives, then find all li elements within it, and, finally, extract the text from each of those li elements, which is the name of the knife. The get_text(strip=True) method is used to extract the text content of each li element and remove any leading and/or trailing whitespace(s).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading