Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

scraping nested xml using beautiful soup

xml = """<f transform="translate(7,7)" class="SoccerPlayer SoccerPlayer-11 Team-Away  Outcome-Complete" data-id="8">
    <rect x="-15" y="-15" width="30" height="30" transform="rotate(0)" class="SoccerShape"></rect>
    <text x="0" y="7" text-anchor="middle" transform="translate(0,0)rotate(0)">11</text>
    <text class="Soccer-Hidden">
        <div>
            <h3>
                <span class="Soccer-Key">
            Suc passes
          </span>
                <span class="Soccer-Value">
            82
          </span>
            </h3>
            <p>
          Ronaldo
        </p>
        </div>
    </text>
</f>"""

I’m currently trying to scrape the above xml, by using soup.
Specifically

from bs4 import BeautifulSoup as bs
soup=bs(xml, "xml")
for pr in soup.find_all("f")):
    try:
        player = pr['class']
        time = pr['data-id']
    except:
        pass
    print(player,time)

This is working as intended.

I am having difficulties scraping the nested information in the <text class="Soccer-Hidden"> tag.
I’m trying to scrape the <span class="Soccer-Key">, <span class="Soccer-Value"> and also the value between the <p> tags (the Ronaldo text).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What can I add to my code to get these? Thanks

>Solution :

Try with the method findChildren, giving class options in a dictionary:

for pr in soup.find_all("f"):
    soc_key = pr.findChildren("span" , { "class" : "Soccer-Key" })[0].text
    soc_value = pr.findChildren("span" , { "class" : "Soccer-Value" })[0].text
    name = pr.findChildren("p")[0].text
    print(soc_key, soc_value, name)

will get you Suc passes 82 Ronaldo with some extra space you can remove with strip()

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading