Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

BeautifulSoup deleting first half of HTML?

I’m practicing with BeautifulSoup and HTML requests in general for the first time. The goal of the programme is to load a webpage and it’s HTML, then search through the webpage (in this case a recipe, to get a sub string of it’s ingredients). I’ve managed to get it working with the following code:

url = "https://www.bbcgoodfood.com/recipes/healthy-tikka-masala"

result = requests.get(url)
myHTML = result.text
index1 = myHTML.find("recipeIngredient")
index2 = myHTML.find("recipeInstructions")
ingredients = myHTML[index1:index2]

But when I try and use BeautifulSoup here:

url = "https://www.bbcgoodfood.com/recipes/healthy-tikka-masala"

result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
ingredients = doc.find(text = "recipeIngredient")
print(ingredients)

I understand that the code above (even if I could get it working) would produce a different output of just ["recipeIngredient"] but that’s all I’m focused on for now whilst I get to grips with BS. Instead the code above just outputs None. I printed "doc" to the terminal and it would only output what appears to be the second half of the HTML (or at least : not all of it). Whereas , the text file does contain all HTML, so I assume that’s where the problem lies but i’m not sure how to fix it.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thank you.

>Solution :

You need to use:

class_="recipe__ingredients"

For example:

import requests
from bs4 import BeautifulSoup

url = "https://www.bbcgoodfood.com/recipes/healthy-tikka-masala"

doc = (
    BeautifulSoup(requests.get(url).text, "html.parser")
    .find(class_="recipe__ingredients")
)

ingredients = "\n".join(
    ingredient.getText() for ingredient in doc.find_all("li")
)

print(ingredients)

Output:

1 large onion , chopped
4 large garlic cloves
thumb-sized piece of ginger
2 tbsp rapeseed oil
4 small skinless chicken breasts, cut into chunks
2 tbsp tikka spice powder
1 tsp cayenne pepper
400g can chopped tomatoes
40g ground almonds
200g spinach
3 tbsp fat-free natural yogurt
½ small bunch of coriander , chopped
brown basmati rice , to serve
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading