Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

HTML table to json dict with BeautifulSoup Python

I have the following HTML data:

<table>
  <tbody>
    <tr>
      <th class="left" colspan="7">
        <p>Some text</p>
      </th>
    </tr>
    <tr>
      <td class="left print-wide" colspan="2">  </td>
      <td class="print-wide" colspan="13">some-text</td>
    </tr>
    <tr>
      <td class="left"><br /></td>
      <td><strong>ABC   </strong></td>
      <td><strong>≤25%</strong></td>
      <td><strong>≤75%</strong></td>
      <td><strong>≤100%</strong></td>
    </tr>
    <tr>
      <td class="left">1 month</td>
      <td>3,93%</td>
      <td>4,05%</td>
      <td>4,09%</td>
      <td>4,18%</td>
    </tr>
    <tr>
      <td class="left">3 months</td>
      <td>4,12%</td>
      <td>4,24%</td>
      <td>4,28%</td>
      <td>4,37%</td>
    </tr>
    <tr>
      <td class="left">6 months</td>
      <td>4,23%</td>
      <td>4,35%</td>
      <td>4,39%</td>
      <td>4,48%</td>
    </tr>
  </tbody>
</table>

I want to convert that to:

{
    "1 month": {
        "ABC": "3,93%",
        "≤25%": "4,05%",
        "≤75%": "4,09%",
        "≤100%": "4,18%"
    },
    "3 month": {
        "ABC": "4,12%",
        "≤25%": "4,24%",
        "≤75%": "4,28%",
        "≤100%": "4,37%"
    },
    "6 month": {
        "ABC": "4,23%",
        "≤25%": "4,35%",
        "≤75%": "4,39%",
        "≤100%": "4,48%"
    }
}

I made the following, it creates a list with the months:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

soup = BeautifulSoup(body, "html.parser")
table = soup.find("table")
headers = [header.text for header in table.find_all('td', class_="left")]
del headers[:2]
print(headers)

Prints out:

['1 month', '3 month', '6 month']

Now I have to iterate over that list and create the data I want to have, but I am stuck, I tried several things but with no luck. Can anyone help me in the right direction?

>Solution :

Try:

headers = [s.get_text(strip=True) for s in soup.select("strong")]

out = {}
for tr in soup.select("tr:-soup-contains(month)"):
    out[tr.td.text] = {k: v.text for k, v in zip(headers, tr.select("td")[1:])}

print(out)

Prints:

{
    "1 month": {
        "ABC": "3,93%",
        "≤25%": "4,05%",
        "≤75%": "4,09%",
        "≤100%": "4,18%",
    },
    "3 months": {
        "ABC": "4,12%",
        "≤25%": "4,24%",
        "≤75%": "4,28%",
        "≤100%": "4,37%",
    },
    "6 months": {
        "ABC": "4,23%",
        "≤25%": "4,35%",
        "≤75%": "4,39%",
        "≤100%": "4,48%",
    },
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading