Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to find a line of text inside multiple div classes python

Hello everyone I’m trying to pull certain text info from a website not all of the text is needed but I’m confused about how to do so when the text is in multiple divs.
here is the code I’m looking at. But I get confused when there are multiple rows inside. I need to pull the "Number" title and the text (which is 837270), and the "Location" title and the text which is (Ohio)

                   <br>
                <br>
              </p>
            </div>
          </div>
          <div class="row">
            <div class="col-md-4">
                <p>
                  <span class="text-muted">Number</span>
                  <br>
                  "837270"
                </p>
            </div>
            <div class="col-md-4">
              <p>
                <span class="text-muted">Location</span>
                <br>
                "Ohio"
              </p>
            </div>
              <div class="col-md-4">
                <p>
                  <span class="text-muted">Office</span>
                <be>
                   "Joanna" 
                </p>
              </div>
          </div>
          <div class="row">
            <div class="col-md-4">
              <p>
                <span class="text-muted">Date</span>
              <be>
                "07/01/2022"
              </p>
            </div>
            <div class="col-md-4">
                <p>
                  <span class="text-muted">Type</span>
                <br>
                  "Business"
                </p>
            </div>
            <div class="col-md-4">
                <p>
                  <span class="text-muted">Status</span>
                  <br>
                  "Open"
                </p>
            </div>
          </div>
        </div>
      </div>

    </div>

I’ve tried this and it prints out none.

soup = BeautifulSoup(driver.page_source,'html.parser')  
df = soup.find('div', id = "Location")
print(df.string)

I want to pull it and save it. any help would be appreciated thank you.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Sometimes HTML won’t have IDs or other patterns that can be followed easily. You can get pretty clever with this though, you don’t have to rely on HTML pages using table structures.

In this case, for example, it appears each section is titled by a <span class="text-muted"> tag and its value is the last sibling of that span tag.

To scrape each of these titles and their values, we can do something like this:

import bs4
from bs4 import BeautifulSoup
soup = BeautifulSoup(..., 'lxml')

for title_tag in soup.find_all('span', class_='text-muted'):

    # get the last sibling
    *_, value_tag = title_tag.next_siblings

    title = title_tag.text.strip()

    if isinstance(value_tag, bs4.element.Tag):
        value = value_tag.text.strip()
    else:  # it's a navigable string element
        value = value_tag.strip()

    print(title, value)

Output:

Number "837270"
Location "Ohio"
Office "Joanna"
Date "07/01/2022"
Type "Business"
Status "Open"

There are of course other patterns you could identify here to reliably get the values. This is just one example.

If you wanted to get just the Location, you could locate it by its text.

location_tag = soup.find('span', class_='text-muted', text='Location')

Then getting its value is the same in the above.

*_, location_value_element = location_tag.next_siblings
print(location_value_element.strip()) # "Ohio"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading