I have a feature to compare the inserted data with the original data inside an image and we use Google Vision OCR to extract the text.
the OCR will give you the result per-block
so it will give you an array result
const result = [
{
text: This is the first block
},
{
text: This is the second block
},
{
text: Created on 20 September 2021
},
]
My question is, how can I get the date ( 20 September 2021 ) ?, so I can do compare for the data that has been inserted.
I did some logic with looping and regex, but I still can’t finish it because I still need to learn about regex, and tbh, I spent a day with this regex.
I just thought, the images that need to compare is not consistent for about the date.
maybe I will found the date with only per-block ( without any text, it’s only show the date ),
or the date is separated by space ( 20 September 2021 ),
or the date is separated by dash ( 20-September-2021 ),
or the date is separated by slash ( 20/September/2021 ),
or the date month using a number ( 20-09-2021 ).
The main thing is, the date structure is not always same.
in this case I compare the inserted data with a certificate image.
So, if I can get the date, I will make the date to be consistent date format using momentjs().format()
I think that’s all, thank you.
>Solution :
Based upon your expected inputs, here is some RegExp that will work:
- Find 1,2 digits (day)
- Find a space, -, or /
- Find either a string of 3 to 9 characters or 2 digits
- Find a space, -, or /
- Find 2 to 4 digits
let regex = /\d{1,2}(-| |\/)(\w{3,9}|\d{1,2})(-| |\/)\d{2,4}/;
const inputs = ['some random text: 20 September 2020', '20/September/2020', '20-September-20', '20/09/2020', '20 Sep 20', '20-09-2020'];
for(const input of inputs){
console.log(input.match(regex)[0]);
}
