BeautifulSoup deleting first half of HTML?


I’m practicing with BeautifulSoup and HTML requests in general for the first time. The goal of the programme is to load a webpage and it’s HTML, then search through the webpage (in this case a recipe, to get a sub string of it’s ingredients). I’ve managed to get it working with the following code:

url = ""

result = requests.get(url)
myHTML = result.text
index1 = myHTML.find("recipeIngredient")
index2 = myHTML.find("recipeInstructions")
ingredients = myHTML[index1:index2]

But when I try and use BeautifulSoup here:

url = ""

result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
ingredients = doc.find(text = "recipeIngredient")

I understand that the code above (even if I could get it working) would produce a different output of just ["recipeIngredient"] but that’s all I’m focused on for now whilst I get to grips with BS. Instead the code above just outputs None. I printed "doc" to the terminal and it would only output what appears to be the second half of the HTML (or at least : not all of it). Whereas , the text file does contain all HTML, so I assume that’s where the problem lies but i’m not sure how to fix it.

Thank you.

>Solution :

You need to use:


For example:

import requests
from bs4 import BeautifulSoup

url = ""

doc = (
    BeautifulSoup(requests.get(url).text, "html.parser")

ingredients = "\n".join(
    ingredient.getText() for ingredient in doc.find_all("li")



1 large onion , chopped
4 large garlic cloves
thumb-sized piece of ginger
2 tbsp rapeseed oil
4 small skinless chicken breasts, cut into chunks
2 tbsp tikka spice powder
1 tsp cayenne pepper
400g can chopped tomatoes
40g ground almonds
200g spinach
3 tbsp fat-free natural yogurt
½ small bunch of coriander , chopped
brown basmati rice , to serve

Leave a Reply Cancel reply