Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python read html table from confluence and print each row as list

I’d like to parse confuence page ,read table and create list for each row.

My Table looks like

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My code

x = confluence.get_page_by_id(p_id,expand="body.storage")

soup = BeautifulSoup(x["body"]["storage"]["value"], 'html.parser')

for tables in soup.select("table tr"):
    data = [item.get_text() for item in tables.select("td")]
    print(data)

But problem is, second column becuase of the new lines output of the code

['Karnataka','Bangalore','BangaloreMysoreTumkur']

And I want the output ot look like

['Karnataka','Bangalore','Bangalore Mysore Tumkur']

Can you please provide the code to fix this.

Thanks for the help!

>Solution :

BeautifulSoup removes the whitespace in rendered HTML, to use a custom separator use this:

data = [item.get_text(separator=" ") for item in tables.select("td")]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading