Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python: better iteration workflow using Beautifulsoup and pandas?

folks

I’m working on extracting some sentences from a document and tying to make a dataframe with BeautifulSoup and pandas as follows. There are some iterations so I think it would be written in a better way like a pro. Could you help with developing these lines of code? Thank you!

import pandas as pd
from bs4 import BeautifulSoup

bs = BeautifulSoup(html, 'html.parser')

t1 = bs.find_all('h1')[1].text.replace('_room1',"")
t2 = bs.find_all('h1')[2].text.replace('_room1',"") 
t3 = bs.find_all('h1')[3].text.replace('_room1',"")
t4 = bs.find_all('h1')[4].text.replace('_room1',"")

p1 = bs.find_all('p')[3].text
p2 = bs.find_all('p')[4].text + bs.find_all('p')[5].text + bs.find_all('p')[6].text + bs.find_all('p')[7].text
p3 = bs.find_all('p')[8].text
p4 = bs.find_all('p')[9].text


data = {t1: p1,
      t2: p2,
      t3: p3,
      t4: p4}

df = pd.DataFrame(data, index=[0])

df

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

How about just getting the text from your H1’s and P’s in one go:

h1s = [h1.text for h1 in bs.select('h1')[:4]]
ps =  [p.text for p in bs.select('p')]

df = pd.DataFrame({
    h1: p
    for h1, p in zip(h1s, [ps[3], ''.join(ps[4:7]), ps[8], ps[9])
}).T
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading