Python read html table from confluence and print each row as list

I’d like to parse confuence page ,read table and create list for each row.

My Table looks like

enter image description here

My code

x = confluence.get_page_by_id(p_id,expand="body.storage")

soup = BeautifulSoup(x["body"]["storage"]["value"], 'html.parser')

for tables in soup.select("table tr"):
    data = [item.get_text() for item in tables.select("td")]
    print(data)

But problem is, second column becuase of the new lines output of the code

['Karnataka','Bangalore','BangaloreMysoreTumkur']

And I want the output ot look like

['Karnataka','Bangalore','Bangalore Mysore Tumkur']

Can you please provide the code to fix this.

Thanks for the help!

>Solution :

BeautifulSoup removes the whitespace in rendered HTML, to use a custom separator use this:

data = [item.get_text(separator=" ") for item in tables.select("td")]

Leave a Reply