Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

find a text with pattern from google lens response using regex

i am trying to get learner license number from google lens by uploading image
but my regex is not working as
license number are appearing in following patterns

KL 14 /0000007/2023

KL14 /0000007/2023

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

KL 14/0000007/2023

KL 14 /0000007/ 2023

etc
which means there may be space between or may not

my regex is KL [0-9]{1}./.[0-9]{1}./.[0-9]{1}.
but it is not working

my code
`from lxml.html import soupparser
import re
import os
import requests
folder_dir = os.getcwd()
for images in os.listdir(folder_dir):
try:

    # check if the image end swith png or jpg or jpeg
    if (images.endswith(".png") or images.endswith(".jpg") \
            or images.endswith(".jpeg")):


        proxy = '127.0.0.1:8080'
        os.environ['http_proxy'] = proxy
        os.environ['HTTP_PROXY'] = proxy
        os.environ['https_proxy'] = proxy
        os.environ['HTTPS_PROXY']= proxy
        os.environ['REQUESTS_CA_BUNDLE'] = "C:\\Users\\User\\Desktop\\cacert.pem"


        print("-------------------------------------------------------------------------------------")
        print(images)
        print("\n")
        captchaurl = 'https://lens.google.com/upload?ep=ccm&s=csp&st=1653142987619'
        encoded_image = {'encoded_image': open(images, 'rb')}
        burp0cap_headers = {"Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1",
                            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36",
                            "Origin": "null",
                            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
                            "Sec-Gpc": "1", "Sec-Fetch-Site": "none",
                            "Sec-Fetch-Mode": "navigate", "Sec-Fetch-User": "?1",
                            "Sec-Fetch-Dest": "document", "Accept-Encoding": "gzip, deflate",
                            "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"}
        rlens = requests.post(captchaurl, files=encoded_image, headers=burp0cap_headers,
                              allow_redirects=True)
        DATA000 = str(rlens.content)
        # print(DATA000)
        root = soupparser.fromstring(DATA000)
        result_url = root.xpath('//meta[@http-equiv="refresh"]/@content')
        result_url = str(result_url[0])
        url2 = result_url.split('URL=')
        finalurl = str(url2[1])
        # print(finalurl)
        burp1cap_headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36",
            "Accept-Encoding": "gzip, deflate",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
            "Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1", "Origin": "null",
            "Sec-Gpc": "1", "Sec-Fetch-Site": "none", "Sec-Fetch-Mode": "navigate",
            "Sec-Fetch-User": "?1", "Sec-Fetch-Dest": "document",
            "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"}
        r2 = requests.get(finalurl, headers=burp1cap_headers)
        r3 = str(r2.text)

        r4 = r3.replace('"', '')
        #print(r4)

        phoneNumRegex2 = re.compile(r'KL *[0-9]{1}.*\/.*[0-9]{1}.*\/.*[0-9]+')
       
        mo = phoneNumRegex2.search(str(r4))
        print(mo.group())
        
except Exception as e:
    print(e)`

response of google lens is

"text:0:e90nKYDCi5I\u003d"],6,[]]],[[],3]]]],[]],[[],null,null,"en",[[["FORM 3 [See Rule 3(a) a","LEARNER’S L","Application No… 394442223","Learner’s Licence","KL 14 /0002707/2023","Issue Date…..","1. Name","SATHEESAN U","2. Father’s Name","CHOUKAR K","Date of Birth","07-03-1984"]],"Ad7f3FjZKr2A8ovUoig+fwJqhVKxG6sbvcjciTQV+KzOBTZf2VGydPYtpIkEMPU6sQyWL+Ad8/Vjl0/OV0izP/oXCluFA2xNbzAktl3KxaOVnfyvyS3kTwHv",[1678139279,21105500

something including above
i need to get learner licnese from above response

output gives none vaule

i will provide sample images as attacthedenter image description here

>Solution :

This regex considers whitespace between any of the elements:

KL\s*\d+\s*/\s*\d+\s*/\s*\d+

\s* means zero or more whitespace characters. Then you match all the digits with \d+, which means one or more digit – you matched only 1 digit incorrectly with your regex.

Regex101 playground/explantation

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading