Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert html table to json with BeautifulSoup

I am trying to convert HTML table to json using beautifulsoup() function python, I was able to convert but the data coming in wrong json format.

from bs4 import BeautifulSoup
import json

reading_table = """\
<table>
<tbody>
<tr>
<td><span class="customlabel">Energy Source</span></td>
<td><span class="custominput">EB</span></td>
<td><span class="customlabel">Grid Reading </span></td>
<td><span class="custominput">2666.2</span></td>
<td><span class="customlabel">DG Reading </span></td>
<td><span class="custominput">15.5</span></td>
</tr>
<tr>
<td><span class="customlabel">Power Factor</span></td>
<td><span class="custominput">0.844</span></td>
<td><span class="customlabel">Total Kw</span></td>
<td><span class="custominput">0.273</span></td>
<td><span class="customlabel">Total KVA</span></td>
<td><span class="custominput">0.34</span></td>
</tr>
<tr>
<td><span class="customlabel">Average Voltage</span></td>
<td><span class="custominput">241.7</span></td>
<td><span class="customlabel">Total Current</span></td>
<td><span class="custominput">1.54</span></td>
<td><span class="customlabel">Frequency Hz</span></td>
<td><span class="custominput">50</span></td>
</tr>
</tbody>
</table>
"""

reading_table_data = [
    [cell.text for cell in row("td")]
    for row in BeautifulSoup(reading_table, features="html.parser")("tr")
]

print(reading_table_data)

The above code prints JSON in the below format.

[['Energy Source', 'EB', 'Grid Reading ', '2666.2', 'DG Reading ', '15.5'], ['Power Factor', '0.844', 'Total Kw', '0.273', 'Total KVA', '0.34'], ['Average Voltage', '241.7', 'Total Current', '1.54', 'Frequency Hz', '50']]

I would like to get it in below format

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[
  'Energy Source': 'EB',
  'Grid Reading ': '2666.2'
  'DG Reading ', '15.5',
  'Power Factor', '0.844',
  'Total Kw', '0.273',
  'Total KVA', '0.34',
  'Average Voltage', '241.7',
  'Total Current', '1.54',
  'Frequency Hz', '50'
]

Some help is appreciated

>Solution :

The output you want is not a valid format, so you can print it after converting the dict to string and replacing the braces.

Here is the working code:

    tds = BeautifulSoup(reading_table, features="html.parser").findAll("td")
    data = {}
    for td in tds:
        if "customlabel" in td.span.get("class"):
            attr_key = td.span.text
            data[attr_key] = ""
        if "custominput" in td.span.get("class"):
            attr_value = td.span.text
            data[attr_key] = attr_value
    print(json.dumps(data).replace("{", "[").replace("}", "]"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading