I´m new to python and I’m trying some challenges on a page that I found I’m trying to extract a date from a string, so I can then put it in a column, this is the input they give me:
"131594", "", "BIDGROUP", 1, 0, 0, 2, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,
"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,
"131594", "AWARD", "ADVANCED_TRIP", 1, 2, 0, 0, "740025Jun2014,705406Jun2014,737722Jun2014,696130Jun2014", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
First I look for the element "ADVANCE_TRIP" and for each identifier, I find in the string I must create a new line with the name "TRIP_ID" in the element and keep the date as I commented before, I show my result when I try it:
"131594", "", "BIDGROUP", 1, 0, 0, 2, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,
"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,
"131594", "AWARD", "TRIP_ID", 1, 2, 0, 0, "7400", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 3, 0, 0, "7054", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 4, 0, 0, "7377", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 5, 0, 0, "6961", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
Now what should give me as a correct output should be the following:
"131594", "", "BIDGROUP", 1, 0, 0, 5, "", 0:00, 0:00, 01JAN2009, 01JAN2009, 01JAN2009, 01JAN2009, false, 0,
"131594", "AWARD", "UNTOUCHABLE", 1, 1, 0, 1, "", 0:00, 0:00, 10JUN2014, 13JUN2014 23:59, 01JAN2009, 01JAN2009, false, 100,
"131594", "AWARD", "TRIP_ID", 1, 2, 0, 0, "7400", 0:00, 0:00, 25Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 3, 0, 0, "7054", 0:00, 0:00, 06Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 4, 0, 0, "7377", 0:00, 0:00, 22Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
"131594", "AWARD", "TRIP_ID", 1, 5, 0, 0, "6961", 0:00, 0:00, 30Jun2014, 01JAN2009, 01JAN2009, 01JAN2009, false, 15,
The only thing I don’t understand how to extract is the date that is attached to the identifier of each "TRIP_ID" and place it in its respective column, which would be in the eleventh, for example, in my output i have: "7400", 0:00, 0:00, 01JAN2009, 01JAN2009
Where it should be: "7400", 0:00, 0:00, 25Jun2014, 01JAN2009
Now this is the code that made:
import sys
lines = []
for line in sys.stdin:
lines.append(line.strip())
output_lines = []
for line in lines:
elements = line.split(", ")
if elements[2] == '"ADVANCED_TRIP"':
elements[2] = '"TRIP_ID"'
trip_ids = elements[7].split(",")
for i, trip_id in enumerate(trip_ids):
trip_id = trip_id.strip('"')
output_line = elements[:7] + [f'"{trip_id[:4]}"'] + elements[8:]
output_line[4] = str(int(output_line[4]) + i)
output_lines.append(output_line)
else:
output_lines.append(elements)
for output_line in output_lines:
print(", ".join(output_line))
Does anyone have and idea how can I continue?
>Solution :
You are on the right track. Correctly extract the date associated with each "TRIP_ID", you can use regular expressions to identify the date format within the string… you can try this
import sys
import re
from datetime import datetime
lines = []
for line in sys.stdin:
lines.append(line.strip())
output_lines = []
for line in lines:
elements = line.split(", ")
if elements[2] == '"ADVANCED_TRIP"':
elements[2] = '"TRIP_ID"'
trip_ids = elements[7].split(",")
dates = re.findall(r'\d{2}[A-Za-z]{3}\d{4}', line) # Extract dates (e.g., 01JAN2009)
for i, (trip_id, date) in enumerate(zip(trip_ids, dates)):
trip_id = trip_id.strip('"')
output_line = elements[:7] + [f'"{trip_id[:4]}"'] + elements[8:]
output_line[4] = str(int(output_line[4]) + i)
output_line[10] = date # Replace the placeholder date with the extracted date
output_lines.append(output_line)
else:
output_lines.append(elements)
for output_line in output_lines:
print(", ".join(output_line))
Hope this helps! And good luck with Python!