Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regular expression to extract a year from anywhere within a free text string

I need to write a regex that will match the 1st year from a line of free text, anywhere within the string.

The year will be 4 digits and will begin 20 or 21 (eg 2030 or 2199)

It should NOT match longer numbers like 20304050

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

here is some js code I have written with the output below, as you can see each regex works for some cases but none for all.

NB – the final version of this wont be JS, so I don’t want solutions that require additional code, just a pure regular expression, although I could live with the result with the extra full stop and truncate the results to 4 chars. Thanks

const values = [
  '2025',
  '2150 is the year to match',
  'the year is 2030 see ref 2099662',
  'Should match the year here YEAR_2140 even though it has non numric chars preceeding it',
  'Should match the year at the end of a string like this - 2140',
  'ref 2099662 the year is 2140. And there is another sentence',
  'ref 2099662 the end of the string is the year 2140',
  'There is no year here 2055667'
]
console.log('   regx1', 'regx2,', 'regx3,', 'input string')

values.forEach((value, index) => {
  value = value.trim()
  const regex1 = /2[01]{1}[0-9]{2}/
  const regex2 = /2[01]{1}[0-9]{2}[^0-9]{1}/
  const regex3 = /2[01]{1}[0-9]{2}[^0-9]{1}/

  const year1 = (value.match(regex1) || [])[0] || '     '
  const year2 = (value.match(regex2) || [])[0] || '     '
  const year3 = (value.match(regex3) || [])[0] || '     '

  console.log(`${index + 1}) ${year1}, ${year2}, ${year3}, "${value}",`)
})

This code outputs:

   regx1 regx2, regx3, input string
1) 2025,      ,      , "2025",
2) 2150, 2150 , 2150 , "2150 is the year to match",
3) 2030, 2030 , 2030 , "the year is 2030 see ref 2099662",
4) 2140, 2140 , 2140 , "Should match the year here YEAR_2140 even though it has non numric chars preceeding it",
5) 2140,      ,      , "Should match the year at the end of a string like this - 2140",
6) 2099, 2140., 2140., "ref 2099662 the year is 2140. And there is another sentence",
7) 2099,      ,      , "ref 2099662 the end of the string is the year 2140",
8) 2055,      ,      , "There is no year here 2055667",

>Solution :

(?:\b|\W)?(?<year>2(?:0|1)\d\d)\b

This pattern works for me:

const values = [
  '2025',
  '2150 is the year to match',
  'the year is 2030 see ref 2099662',
  'Should match the year here YEAR_2140 even though it has non numric chars preceeding it',
  'Should match the year at the end of a string like this - 2140',
  'ref 2099662 the year is 2140. And there is another sentence',
  'ref 2099662 the end of the string is the year 2140',
  'There is no year here 2055667'
];

const reg = /(?:\b|\W)?(?<year>2[01]\d\d)\b/g;

for (const v of values) {

  //const matches = reg.exec( v );
  const matches = Array.from( v.matchAll( reg ) );
  
  const matchJson = JSON.stringify( matches );
  addRow( v, matchJson );
}

function addRow( x, y ) {
  const tbody = document.getElementById('rows');
  const tr    = tbody.insertRow(-1);

  const tdInput = tr.insertCell();
  const tdMatch = tr.insertCell();

  tdInput.textContent = x;
  tdMatch.textContent = y;
}
table {
  border: 1px outset #ccc;
}

th,
td {
  border: 1px inset #ccc;
  padding: 0.5em;
}
<table>
  <thead>
    <tr>
      <th>Input string</th>
      <th>Match</th>
    </tr>
  </thead>
  <tbody id="rows">
  </tbody>
</table>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading