I’m trying to learn to work with Python and BeautifulSoup. As a project for myself I am scraping a recipe website and displaying certain items in a template to learn to work with it.
The website is displaying meal prep time, calories and the amount of people who can eat from a recipe in a row as li in a div.
There are 35 such div in a grid on the website. I want to only select the meal prep time from the div to store in a list. All of the li have the same class and no other attributes. How do I only select the li I need?
Below the HTML code of the page. There are 35 of these div, each with a different recipe.
<div class="column xxlarge-4 large-6 small-12 ">
<a role="link" aria-label="Recept: 'Tiramisu' met advocaat" data-testhook="recipe-card" title="Recept: 'Tiramisu' met advocaat" href="/allerhande/recept/R-R1196417/tiramisu-met-advocaat" class="display-card_root__o17AY card_root__VNG0M card_roundCorners__dYaFu display-card_anchor__cTFon" data-analytics="LINK_CLICK" data-analytics-meta="%7B%22component%22%3A%22recipe-search%22%2C%22href%22%3A%22%2Fallerhande%2Frecept%2FR-R1196417%2Ftiramisu-met-advocaat%22%2C%22title%22%3A%22R-R1196417%22%7D">
<div class="display-card-section_section__42C0n display-card-body_body__r2mt4 card-body_root__E16CU">
<div class="ratio-box_root__YH5Fe ratio-box_ratio-21-10__thBP0">
<div class="ratio-box_content__k-Jz7">
<img class="card-image-set_imageSet__Su7xI lazyautosizes ls-is-cached lazyloaded" alt="'Tiramisu' met advocaat" data-srcset=", https://static.ah.nl/static/recepten/img_RAM_PRD163172_220x162_JPG.jpg 220w 162h, >
</div>
</div>
</div>
<footer class="display-card-section_section__42C0n display-card-section_padded__lHvvK display-card-footer_footer__cxMve card-footer_root__0dl7R">
<ul class="recipe-card-properties_root__rFiwt recipe-card-properties_allerhande__0gSBC" data-testhook="recipe-card-properties">
<li class="recipe-card-properties_property__87cH1">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" class="allerhande-icon recipe-card-properties_icon__wBmG9 svg svg--svg_time" viewBox="0 0 24 24" width="24" height="16">
<use xlink:href="#svg_time">
</use>
</svg>
20 min
</li>
<li class="recipe-card-properties_property__87cH1">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" class="allerhande-icon recipe-card-properties_icon__wBmG9 svg svg--svg_calories" viewBox="0 0 24 24" width="24" height="16">
<use xlink:href="#svg_calories">
</use>
</svg>
545 kcal
</li>
<li class="recipe-card-properties_property__87cH1">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" class="allerhande-icon recipe-card-properties_icon__wBmG9 svg svg--svg_person" viewBox="0 0 24 24" width="24" height="16">
<use xlink:href="#svg_person">
</use>
</svg>
8</li>
</ul>
<p class="typography_root__Om3Wh typography_variant-paragraph__T5ZAU typography_hasMargin__4EaQi card-text_title__REC-7">
<span class="line-clamp_root__7DevG line-clamp_active__5Qc2L card-text_titleText__7T9sY card-text_boldTitle__SVYw2" data-testhook="recipe-card-title" style="-webkit-line-clamp: 2; line-height: 1.2em; max-height: 2.4em;">
'Tiramisu' met advocaat
</span>
</p>
</footer>
</a>
</div>
and here is the code I am using to substract the information I need:
#Create soup
webpage_response = requests.get("https://www.ah.nl/allerhande/recepten-zoeken?sortBy=TRENDING")
webpage = webpage_response.content
soup = BeautifulSoup(webpage, "html.parser")
recipe_links = soup.find_all('a', attrs={'class' : re.compile('^display-card_root__.*')})
recipe_pictures = soup.find_all('img', attrs={'class' : re.compile('^card-image-set_imageSet__.*')})
recipe_prep_time = soup.find_all('li', attrs={'class' : re.compile('^recipe-card-properties_property__.*')})
However: this selects all the li items, including calories etc, which creates an issue if I want to select the correct time from the list.How can I onlt select the first li?
>Solution :
Simple and straightforward solution:
recipe_prep_time = [ul.find('li').text
for ul in soup.find_all('ul',
attrs={'class': re.compile('^recipe-card-properties_root')})]
yields
['15 min',
'15 min',
'20 min',
'20 min',
'35 min',
'20 min',
'20 min',
'10 min',
...]