Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Iterate through a string and append static values to a list for every occurrence of substring (Python)

I’m currently stuck on some basic Python. I currently have a very long html string that looks something like this:

<relative-time class="no-wrap" datetime="2023-03-07T02:38:29Z" title="Mar 6, 2023, 7:38 PM MST">Mar 6, 2023</relative-time>, <relative-time data-view-component="true" datetime="2023-03-06T10:25:38-07:00

I want to iterate through and, at every substring = "datetime", store the date that follows.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My current implementation is that I have two lists. One list stores the index of the .find() method for datetimes so:

datetime_indexes = list(get_all_updates(string, "datetime"))
print(datetime_indexes)
#output 36, 168 etc

Next, I have a loop to go through the string and, if the index that I’m currently on in that string matches a value stored in my index list, append the datetime value to a new list.

count = 0
all_datetimes = []
for i in string:
    if string.index(i) is datetime_indexes[count]:
        all_datetimes.append(string[string.index(i) + 10:(string.index(i) + 10 + 21)])
        count = count + 1

Currently, it outputs the first "datetime" value that I’m looking for:

#output
#2023-03-07T02:38:29Z

The desired result here would be to show all datetime values, so:

#desired output
2023-03-07T02:38:29
2023-03-06T10:25:38

>Solution :

This is what Beautiful Soup was made to do:

python -m pip install beautifulsoup4

Then you can do:

from bs4 import BeautifulSoup

html_text = """
<relative-time class="no-wrap" datetime="2023-03-07T02:38:29Z" title="Mar 6, 2023, 7:38 PM MST">Mar 6, 2023</relative-time>,
<relative-time data-view-component="true" datetime="2023-03-06T10:25:38-07:00">asdf</relative-time>
"""

soup = BeautifulSoup(html_text, "html.parser")
date_list = [tag["datetime"] for tag in soup.findAll(attrs={"datetime" : True})]
print(date_list)

That will give you:

['2023-03-07T02:38:29Z', '2023-03-06T10:25:38-07:00']

Since you were already using BeautifulSoup, I think the key part here is find_all("relative-time") being replaced with findAll(attrs={"datetime" : True}) to get all tags with an attribute datetime

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading