Trying scrape data from a HTML file, which has a react props DIV in it like this:
<html>
<div data-react-class="UserDetails"
data-react-props="{
"targetUser
":{
"targetUserLogin":"user",
"targetUserDuration":"11 months, 27 days","""
}
}
and the thing I am looking for is the date! like 11 months, 27 days so I can add them up to get an exact number of "days"
I have no idea how to accurately get this data since different person can be 2 years exactly and no days would be in the text. I need both year and days so I can calculate. so I wrote this to find the the part of the code that I need, but I don’t know to how to approach the rest..
with open("data.html", 'r') as fpIn:
for line in fpIn:
line = line.rstrip() # Strip trailing spaces and newline
if "targetUserDuration" in line:
print("Found")
>Solution :
Use regular expresions to find it.
import re
html = '..."targetUserDuration":"11 months, 27 days","""...'
years_re = re.compile(r'UserDuration".*?([1-9]+) year.*?"""')
months_re = re.compile(r'UserDuration".*?([1-9]|1[0-2]) month.*?"""')
days_re = re.compile(r'UserDuration".*?([1-9]|2[0-9]|3[0-1]) day.*?"""')
year_found = years_re.search(html)
months_found = months_re.search(html)
days_found = days_re.search(html)
years, months, days = 0, 0, 0
if year_found:
years = int(year_found.group(1))
if months_found:
months = int(months_found.group(1))
if days_found:
days = int(days_found.group(1))
print('years: ', years)
print('months: ', months)
print('days: ', days)
Result:
years: 0
months: 11
days: 27