Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to Fix an Error Processing Dates in Python?

I´m new to python and I’m trying some challenges on a page that I found I’m trying to extract a date from a string, so I can then put it in a column, this is the input they give me:

"131594", "", "BIDGROUP", 1, 0, 0, 2, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,

"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,

"131594", "AWARD", "ADVANCED_TRIP", 1, 2, 0, 0, "740025Jun2014,705406Jun2014,737722Jun2014,696130Jun2014", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

First I look for the element "ADVANCE_TRIP" and for each identifier, I find in the string I must create a new line with the name "TRIP_ID" in the element and keep the date as I commented before, I show my result when I try it:

"131594", "", "BIDGROUP", 1, 0, 0, 2, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,

"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,

"131594", "AWARD", "TRIP_ID", 1, 2, 0, 0, "7400", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

"131594", "AWARD", "TRIP_ID", 1, 3, 0, 0, "7054", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

"131594", "AWARD", "TRIP_ID", 1, 4, 0, 0, "7377", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

"131594", "AWARD", "TRIP_ID", 1, 5, 0, 0, "6961", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

Now what should give me as a correct output should be the following:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

"131594", "", "BIDGROUP", 1, 0, 0, 5, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,

"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,

"131594", "AWARD", "TRIP_ID", 1, 2, 0, 0, "7400", 0:00, 0:00, 25Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

"131594", "AWARD", "TRIP_ID", 1, 3, 0, 0, "7054", 0:00, 0:00, 06Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

"131594", "AWARD", "TRIP_ID", 1, 4, 0, 0, "7377", 0:00, 0:00, 22Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

"131594", "AWARD", "TRIP_ID", 1, 5, 0, 0, "6961", 0:00, 0:00, 30Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,

The only thing I don’t understand how to extract is the date that is attached to the identifier of each "TRIP_ID" and place it in its respective column, which would be in the eleventh, for example, in my output i have: "7400", 0:00, 0:00, 01JAN2009, 01JAN2009
Where it should be: "7400", 0:00, 0:00, 25Jun2014, 01JAN2009

Now this is the code that made:

import sys

lines = []

for line in sys.stdin:
    lines.append(line.strip())

output_lines = []

for line in lines:
    elements = line.split(", ")
    if elements[2] == '"ADVANCED_TRIP"':
        elements[2] = '"TRIP_ID"'
        trip_ids = elements[7].split(",")
        for i, trip_id in enumerate(trip_ids):
            trip_id = trip_id.strip('"')
            output_line = elements[:7] + [f'"{trip_id[:4]}"'] + elements[8:]
            output_line[4] = str(int(output_line[4]) + i)
            output_lines.append(output_line)
    else:
        output_lines.append(elements)

for output_line in output_lines:
    print(", ".join(output_line))

Does anyone have and idea how can I continue?

>Solution :

You are on the right track. Correctly extract the date associated with each "TRIP_ID", you can use regular expressions to identify the date format within the string… you can try this

import sys
import re
from datetime import datetime

lines = []

for line in sys.stdin:
    lines.append(line.strip())

output_lines = []

for line in lines:
    elements = line.split(", ")
    if elements[2] == '"ADVANCED_TRIP"':
        elements[2] = '"TRIP_ID"'
        trip_ids = elements[7].split(",")
        dates = re.findall(r'\d{2}[A-Za-z]{3}\d{4}', line)  # Extract dates (e.g., 01JAN2009)
        for i, (trip_id, date) in enumerate(zip(trip_ids, dates)):
            trip_id = trip_id.strip('"')
            output_line = elements[:7] + [f'"{trip_id[:4]}"'] + elements[8:]
            output_line[4] = str(int(output_line[4]) + i)
            output_line[10] = date  # Replace the placeholder date with the extracted date
            output_lines.append(output_line)
    else:
        output_lines.append(elements)

for output_line in output_lines:
    print(", ".join(output_line))

Hope this helps! And good luck with Python!

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading