I have a list of strings, I want to filter out these strings base on a given year. for example, in the below list, I only want strings with years above 2018 and also strings that don’t contain years. My solution is current, I just need a better way to do this.
data = [
'/soccer/zimbabwe/premier-soccer-league/results/',
'/soccer/zimbabwe/premier-soccer-league-2020/results/',
'/soccer/zimbabwe/premier-soccer-league-2019/results/',
'/soccer/zimbabwe/premier-soccer-league-2018/results/',
'/soccer/zimbabwe/premier-soccer-league-2017/results/']
my script
import re
for i in data:
match = re.match(r".*([1-3][0-9]{3})",i)
if match is not None:
if match.group(1) > '2018':
print(i)
else:
print(i)
expected output:
data = [
'/soccer/zimbabwe/premier-soccer-league/results/',
'/soccer/zimbabwe/premier-soccer-league-2017/results/',
'/soccer/zimbabwe/premier-soccer-league-2019/results/']
>Solution :
You need to append the values to a list (result in the below code). You can do like this,
import re
result = []
for i in data:
match = re.match(r'.*(\d{4})', i)
if match:
if int(match.group(1)) > 2018:
result.append(i)
else:
result.append(i)
Output:
['/soccer/zimbabwe/premier-soccer-league/results/',
'/soccer/zimbabwe/premier-soccer-league-2020/results/',
'/soccer/zimbabwe/premier-soccer-league-2019/results/']
EDIT:
The approach without using the loop.
def is_match(s):
match = re.match(r'.*(\d{4})', s)
return match is None or int(match.group(1)) > 2018
result = list(filter(is_match, data))