Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How should I scrape all <em> tag innertexts within a <ul> and make them into a pandas dataframe?

I am currently trying to scrape the information I want from a website.

The information that I want is contained within a ul>li>em. I have scraped tables before, but I have never scraped lists.

How should I scrape the information I want?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

In addition, I want to know if there is a way to make all the innertexts in <em> and put them in a dataframe.

The <ul> basically looks like this.

<ul class="reportData">
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>

                   ......

        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
</ul>

>Solution :

Just select your <ul> and use in this case stripped_strings to get all text in a list:

data = soup.select_one('ul.reportData').stripped_strings

or more specific with list comprehensionfrom all em

data = [e.text for e in soup.select('ul.reportData em')]

Example

import pandas as pd
from bs4 import BeautifulSoup

html='''
<ul class="reportData">
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
</ul>
'''

soup = BeautifulSoup(html)

data = soup.select_one('ul.reportData').stripped_strings

pd.DataFrame(data, columns=['date'])

Output

date
2015-12-28
2015-12-28
2015-12-28
2015-12-28
2015-12-28
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading