Home How should I scrape all <em> tag innertexts within a <ul> and make them into a pandas dataframe?

Questions

How should I scrape all <em> tag innertexts within a <ul> and make them into a pandas dataframe?

March 18, 2022

I am currently trying to scrape the information I want from a website.

The information that I want is contained within a ul>li>em. I have scraped tables before, but I have never scraped lists.

How should I scrape the information I want?

In addition, I want to know if there is a way to make all the innertexts in <em> and put them in a dataframe.

The <ul> basically looks like this.

<ul class="reportData">
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>

                   ......

        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
</ul>

>Solution :

Just select your <ul> and use in this case stripped_strings to get all text in a list:

data = soup.select_one('ul.reportData').stripped_strings

or more specific with list comprehensionfrom all em

data = [e.text for e in soup.select('ul.reportData em')]

Example

import pandas as pd
from bs4 import BeautifulSoup

html='''
<ul class="reportData">
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
        <li><em>2015-12-28</em></li>
</ul>
'''

soup = BeautifulSoup(html)

data = soup.select_one('ul.reportData').stripped_strings

pd.DataFrame(data, columns=['date'])