I need to write a regex that will match the 1st year from a line of free text, anywhere within the string.
The year will be 4 digits and will begin 20 or 21 (eg 2030 or 2199)
It should NOT match longer numbers like 20304050
here is some js code I have written with the output below, as you can see each regex works for some cases but none for all.
NB – the final version of this wont be JS, so I don’t want solutions that require additional code, just a pure regular expression, although I could live with the result with the extra full stop and truncate the results to 4 chars. Thanks
const values = [
'2025',
'2150 is the year to match',
'the year is 2030 see ref 2099662',
'Should match the year here YEAR_2140 even though it has non numric chars preceeding it',
'Should match the year at the end of a string like this - 2140',
'ref 2099662 the year is 2140. And there is another sentence',
'ref 2099662 the end of the string is the year 2140',
'There is no year here 2055667'
]
console.log(' regx1', 'regx2,', 'regx3,', 'input string')
values.forEach((value, index) => {
value = value.trim()
const regex1 = /2[01]{1}[0-9]{2}/
const regex2 = /2[01]{1}[0-9]{2}[^0-9]{1}/
const regex3 = /2[01]{1}[0-9]{2}[^0-9]{1}/
const year1 = (value.match(regex1) || [])[0] || ' '
const year2 = (value.match(regex2) || [])[0] || ' '
const year3 = (value.match(regex3) || [])[0] || ' '
console.log(`${index + 1}) ${year1}, ${year2}, ${year3}, "${value}",`)
})
This code outputs:
regx1 regx2, regx3, input string
1) 2025, , , "2025",
2) 2150, 2150 , 2150 , "2150 is the year to match",
3) 2030, 2030 , 2030 , "the year is 2030 see ref 2099662",
4) 2140, 2140 , 2140 , "Should match the year here YEAR_2140 even though it has non numric chars preceeding it",
5) 2140, , , "Should match the year at the end of a string like this - 2140",
6) 2099, 2140., 2140., "ref 2099662 the year is 2140. And there is another sentence",
7) 2099, , , "ref 2099662 the end of the string is the year 2140",
8) 2055, , , "There is no year here 2055667",
>Solution :
(?:\b|\W)?(?<year>2(?:0|1)\d\d)\b
const values = [
'2025',
'2150 is the year to match',
'the year is 2030 see ref 2099662',
'Should match the year here YEAR_2140 even though it has non numric chars preceeding it',
'Should match the year at the end of a string like this - 2140',
'ref 2099662 the year is 2140. And there is another sentence',
'ref 2099662 the end of the string is the year 2140',
'There is no year here 2055667'
];
const reg = /(?:\b|\W)?(?<year>2[01]\d\d)\b/g;
for (const v of values) {
//const matches = reg.exec( v );
const matches = Array.from( v.matchAll( reg ) );
const matchJson = JSON.stringify( matches );
addRow( v, matchJson );
}
function addRow( x, y ) {
const tbody = document.getElementById('rows');
const tr = tbody.insertRow(-1);
const tdInput = tr.insertCell();
const tdMatch = tr.insertCell();
tdInput.textContent = x;
tdMatch.textContent = y;
}
table {
border: 1px outset #ccc;
}
th,
td {
border: 1px inset #ccc;
padding: 0.5em;
}
<table>
<thead>
<tr>
<th>Input string</th>
<th>Match</th>
</tr>
</thead>
<tbody id="rows">
</tbody>
</table>