Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python: Changing bs4.element.ResultSet elements in list of lists to text

Hi everyone I have extracted some html elements from a webiste using beautifulsoup and find_all. Therefore I have received a list of list of bs4.elements.ResultSet like this:

[[<li class="WlSsj w9uVi">neu</li>],
 [<li class="WlSsj w9uVi">neu</li>],
 [<li class="WlSsj w9uVi">neu</li>, <li class="WlSsj">Terrasse</li>],
 [<li class="WlSsj w9uVi">neu</li>,
  <li class="WlSsj">Terrasse</li>,
  <li class="WlSsj">Parkplatz</li>]

I would now like to retrieve the text within the bs4 elements and keep the same format of list. I have been experimenting with creating two loops.

fet = []
for feat in features_bs:
    for fets in feat:
        fet.append(fets.text)
    features.append(fet)

The first loop looks at every list (feat) within the original list (features_bs). The second looks at every elements (fets) in every inside list (feats) and then changes the element to text. I would now have liked to append the text back into an empty list (fet), however I would like to keep the same format as before with lists inside lists. At the moment I only get the text inside the first loop like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

['neu',
 'neu',
 'neu',
'Terrasse',
 'neu',
'Terrasse',
 'Parkplatz']

However I would like the output to be:

[['neu'],
['neu'],
['neu','Terrase'],
['neu'],
['Terrase']
['Parkplatz']]

Thanks for the help in advance.

>Solution :

Near to your goal – but there is one temporary list missing:

fet = []
for feat in features_bs:
    el = []
    for fets in feat:
        el.append(fets.text)
    fet.append(el)
fet

Output:

[['neu'], ['neu'], ['neu', 'Terrasse'], ['neu'], ['Terrasse'], ['Parkplatz']]

You could also lean your process and transform it directly into your expected format:

from bs4 import BeautifulSoup

html = '''
<ul>
<li class="WlSsj w9uVi">neu</li>
</ul>
<ul>
<li class="WlSsj w9uVi">neu</li>
</ul>
<ul>
<li class="WlSsj w9uVi">neu</li>, <li class="WlSsj">Terrasse</li>
</ul>
<ul>
<li class="WlSsj w9uVi">neu</li>
</ul>
<ul>
<li class="WlSsj">Terrasse</li>
</ul>
<ul>
<li class="WlSsj">Parkplatz</li>
</ul>
'''

soup = BeautifulSoup(html)
data = []
for ul in soup.find_all('ul'):
    el = []
    for e in ul.find_all('li'):
        el.append(e)
    data.append(el)
data

Output:

[['neu'], ['neu'], ['neu', 'Terrasse'], ['neu'], ['Terrasse'], ['Parkplatz']]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading