I have a list of strings which are in different formats, some of the examples are:-
a=['08/58/13ND','08/58/16ND','08/58/18ND','114/15ND','04/2010/PB','AB/23/2016/CHE']
In this the last digits after "/" are YEAR for each string. Some of the strings have year in perfect format like 2010 and 2016 which I want to leave them as it is. But for other strings the last digits after "/" have 13,16,18 and 15,etc.
I want them to be in YYYY format.
Expected Output:
['08/58/2013/ND','08/58/2016/ND','08/58/2018/ND','114/2015/ND','04/2010/PB','AB/23/2016/CHE']
>Solution :
You could try as follows:
import re
a = ['08/58/13ND','08/58/16ND','08/58/18ND','114/15ND','04/2010/PB','AB/23/2016/CHE']
pattern = r'(\d{2})(?=[A-Z]+$)'
b = [re.sub(pattern,r'20\1/', x) for x in a]
print(b)
['08/58/2013/ND', '08/58/2016/ND', '08/58/2018/ND', '114/2015/ND', '04/2010/PB', 'AB/23/2016/CHE']
Explanation r'(\d{2})(?=[A-Z]+$)':
(\d{2})is a capturing group for 2 digits, to be matched only if followed by one or more characters in[A-Z]at the end of the string$. For this we use the positive lookahead:(?=[A-Z]+$).
Explanation r'20\1/':
\1references aforementioned capturing group;- prepending
20(assuming ALL your years are>= 2000) and appending/.