Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Difficulty Scraping Numerical Values from two divs with the same class

I am new to web scraping and I am having an issue where I want to grab the "rank" and "Item No." from this url. My ultimate goal here is to save this info in a csv and be able to plot the data. The issue I have now is that these two values are placed in two different divs with the same class name, "item_stat".

 <div class="item_stats">
    <div class="item_stat">
          rank
         <span>
          1
         </span>
    </div>
    <div class="item_stat">
          item no.
         <span>
          #3251
         </span>
     </div>
 </div>

I am using the following code to grab the "rank" value.

page = requests.get(URL)
soup = bs(page.content, 'html.parser')
soup2 = bs(soup.prettify(), "html.parser")
lists = soup2.find('div', class_="featured_item")
stats = lists.find('div', class_="item_stats")
stats_val = lists.find('div', class_="item_stat")
rank = stats_val.text.replace('<', '')
rank_val = re.findall(r'\d+', rank)

Output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   ['1']

I think I want this value as a float, and I also do not know how to find the "Item No." value. Using get_text(), and .text.replace() is giving me errors I haven’t had with other scraping projects. I appreciate any advice, thanks.

>Solution :

As one approach you could select all <span>s in your item_stats and extract its texts:

rank = float(soup.select('.item_stats span')[0].text)
number = soup.select('.item_stats span')[1].text.strip()

Example

html = '''
<div class="item_stats">
    <div class="item_stat">
          rank
         <span>
          1
         </span>
    </div>
    <div class="item_stat">
          item no.
         <span>
          #3251
         </span>
     </div>
 </div>
'''

soup = BeautifulSoup(html)

rank = float(soup.select('.item_stats span')[0].text)
number = soup.select('.item_stats span')[1].text.strip()

print(rank)
print(number)
Output
1.0
#3251
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading