I’m currently stuck on some basic Python. I currently have a very long html string that looks something like this:
<relative-time class="no-wrap" datetime="2023-03-07T02:38:29Z" title="Mar 6, 2023, 7:38 PM MST">Mar 6, 2023</relative-time>, <relative-time data-view-component="true" datetime="2023-03-06T10:25:38-07:00
I want to iterate through and, at every substring = "datetime", store the date that follows.
My current implementation is that I have two lists. One list stores the index of the .find() method for datetimes so:
datetime_indexes = list(get_all_updates(string, "datetime"))
print(datetime_indexes)
#output 36, 168 etc
Next, I have a loop to go through the string and, if the index that I’m currently on in that string matches a value stored in my index list, append the datetime value to a new list.
count = 0
all_datetimes = []
for i in string:
if string.index(i) is datetime_indexes[count]:
all_datetimes.append(string[string.index(i) + 10:(string.index(i) + 10 + 21)])
count = count + 1
Currently, it outputs the first "datetime" value that I’m looking for:
#output
#2023-03-07T02:38:29Z
The desired result here would be to show all datetime values, so:
#desired output
2023-03-07T02:38:29
2023-03-06T10:25:38
>Solution :
This is what Beautiful Soup was made to do:
python -m pip install beautifulsoup4
Then you can do:
from bs4 import BeautifulSoup
html_text = """
<relative-time class="no-wrap" datetime="2023-03-07T02:38:29Z" title="Mar 6, 2023, 7:38 PM MST">Mar 6, 2023</relative-time>,
<relative-time data-view-component="true" datetime="2023-03-06T10:25:38-07:00">asdf</relative-time>
"""
soup = BeautifulSoup(html_text, "html.parser")
date_list = [tag["datetime"] for tag in soup.findAll(attrs={"datetime" : True})]
print(date_list)
That will give you:
['2023-03-07T02:38:29Z', '2023-03-06T10:25:38-07:00']
Since you were already using BeautifulSoup, I think the key part here is find_all("relative-time")
being replaced with findAll(attrs={"datetime" : True})
to get all tags with an attribute datetime