Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

BigQuery – Alternative method to Positive Lookahead for RegExes

I’ve written a RegEx pattern that identifies alpha-characters that are immediately followed by a numeric character, with the intention that it would used in BigQuery’s REGEXP_EXTRACT function.

Here’s the pattern: ([A-Z]|[a-z])*(?=[0-9])

However, due to BigQuery’s use of RE2 expression library, the Positive Lookahead function does not work. What’s an alternative method of identifying the numeric character without including it in the extracted string/match?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Use case:

To extract the first 1 or 2 alpha-characters of a UK postcode, e.g.

  • NW9 9KL
  • M1 0TE
  • ph3 2ee
  • N10 10KE

>Solution :

You can use

REGEXP_EXTRACT(col, '^[A-Za-z]+')

The ^[A-Za-z]+ regex matches

  • ^ – start of string
  • [A-Za-z]+ – one or more letters.

Also, if you MUST check for a digit right after the initial letters, you can use a

REGEXP_EXTRACT(col, '^([A-Za-z]+)[0-9]')

The ^([A-Za-z]+)[0-9] regex matches and captures into Group 1 the initial letters, and then just matches a digit (with [0-9]). The REGEXP_EXTRACT function returns the captured substring if there is a capturing group.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading