Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to fetch a very special data from HTML file

Trying scrape data from a HTML file, which has a react props DIV in it like this:

<html>

<div data-react-class="UserDetails"
    data-react-props="{
        &quot;targetUser
            &quot;:{
                &quot;targetUserLogin&quot;:&quot;user&quot;,
                &quot;targetUserDuration&quot;:&quot;11 months, 27 days&quot;,&quot;""
            }
        }

and the thing I am looking for is the date! like 11 months, 27 days so I can add them up to get an exact number of "days"

I have no idea how to accurately get this data since different person can be 2 years exactly and no days would be in the text. I need both year and days so I can calculate. so I wrote this to find the the part of the code that I need, but I don’t know to how to approach the rest..

with open("data.html", 'r') as fpIn:
    for line in fpIn:
        line = line.rstrip()   # Strip trailing spaces and newline
        if "targetUserDuration" in line:
            print("Found")

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use regular expresions to find it.

import re

html = '...&quot;targetUserDuration&quot;:&quot;11 months, 27 days&quot;,&quot;""...'

years_re = re.compile(r'UserDuration&quot.*?([1-9]+) year.*?&quot;""')
months_re = re.compile(r'UserDuration&quot.*?([1-9]|1[0-2]) month.*?&quot;""')
days_re = re.compile(r'UserDuration&quot.*?([1-9]|2[0-9]|3[0-1]) day.*?&quot;""')

year_found = years_re.search(html)
months_found = months_re.search(html)
days_found = days_re.search(html)

years, months, days = 0, 0, 0
if year_found:
    years = int(year_found.group(1))
if months_found:
    months = int(months_found.group(1))
if days_found:
    days = int(days_found.group(1))

print('years: ', years)
print('months: ', months)
print('days: ', days)

Result:

years:  0
months:  11
days:  27
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading