Difficulty Scraping Numerical Values from two divs with the same class

April 23, 2022

I am new to web scraping and I am having an issue where I want to grab the "rank" and "Item No." from this url. My ultimate goal here is to save this info in a csv and be able to plot the data. The issue I have now is that these two values are placed in two different divs with the same class name, "item_stat".

 <div class="item_stats">
    <div class="item_stat">
          rank
         <span>
          1
         </span>
    </div>
    <div class="item_stat">
          item no.
         <span>
          #3251
         </span>
     </div>
 </div>

I am using the following code to grab the "rank" value.

page = requests.get(URL)
soup = bs(page.content, 'html.parser')
soup2 = bs(soup.prettify(), "html.parser")
lists = soup2.find('div', class_="featured_item")
stats = lists.find('div', class_="item_stats")
stats_val = lists.find('div', class_="item_stat")
rank = stats_val.text.replace('<', '')
rank_val = re.findall(r'\d+', rank)

Output:

   ['1']

I think I want this value as a float, and I also do not know how to find the "Item No." value. Using get_text(), and .text.replace() is giving me errors I haven’t had with other scraping projects. I appreciate any advice, thanks.

>Solution :

As one approach you could select all <span>s in your item_stats and extract its texts:

rank = float(soup.select('.item_stats span')[0].text)
number = soup.select('.item_stats span')[1].text.strip()

Example

html = '''
<div class="item_stats">
    <div class="item_stat">
          rank
         <span>
          1
         </span>
    </div>
    <div class="item_stat">
          item no.
         <span>
          #3251
         </span>
     </div>
 </div>
'''

soup = BeautifulSoup(html)

rank = float(soup.select('.item_stats span')[0].text)
number = soup.select('.item_stats span')[1].text.strip()

print(rank)
print(number)