Regex match repeating patterns but not last occurrence

I would like to extract the following patterns:

  1. Initial by letter (Subgroup 1); and then

  2. followed by numbers of any length (Subgroup 2);

  3. followed by letter or any length (Subgroup 3);

  4. repeating 2 & 3 of any occurrences.

I am using https://regexr.com/ to test.

Here are some samples string and my expected output.

String: FAF46ABC7787AAAA  =>   Desired output: FAF46ABC7787

String: FAF46ABC7787      =>   Desired output: FAF46ABC

String: FAF46ABC          =>   Desired output: FAF46

String: FAF46             =>   Desired output: FAF

String: FAF               =>   Desired output: FAF

String: FAF46 GG(Not CC)  =>   Desired output: FAF

String: FAF46.doc         =>   Desired output: FAF

I tested the following but not working:

  1. Lookahead method suggested by

Python regex matching all but last occurrence

1a. ^([a-zA-Z]+)([0-9]*[a-zA-Z]*)(?=[0-9]+|[a-zA-Z]+)

1b. ^([a-zA-Z]+)(([0-9])*([a-zA-Z])*)(?=[0-9]+|[a-zA-Z]+)

  1. Capture all subgroups and exclude last occurrence by loop

2a. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*

  1. Using replace method

3a. (^(?:[a-zA-Z]+[0-9]*)(?:[a-zA-Z]+[0-9]*)*)([a-zA-Z]+|[0-9]+) and replace by $1

  1. Exclude ending occurrence by using non-capturing group

4a. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*(?:[0-9]+|[a-zA-Z]+)$

4b. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*(?:([0-9]+|[a-zA-Z]+))$

4c. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*(?:[0-9a-zA-Z]+)$

4d. ^([a-zA-Z]+)(([0-9]*?)([a-zA-Z]*?))*(?:[0-9a-zA-Z]+)$

I also change greedy or lazy to see if any miracles happen. But no luck.

I thought it should be easy task. But it is obvious that it is harder than what I thought.

I would appreciate for any kind of help.

Please note that I do not have extended regex if it is the case to work it work.
Thank you.

>Solution :

You can search using this regex:

^([a-zA-Z]+[0-9a-zA-Z]*?)(?:[0-9]+|[A-Z]*)\b.*

and replace with $1

RegEx Demo

RegEx Details:

  • ^: Start
  • (: Start capture group #1
    • [a-zA-Z]+: Match 1+ letters
    • [0-9a-zA-Z]*?: Match 0 or more letter or digits (non-greedy)
  • ): End 1st capture group
  • (?:: Start non-capture group
    • [0-9]+: Match 1+ digits
    • |: OR
    • [A-Z]*: Match 0 or more uppercase letters
  • ): End non-capture group
  • \b: Word boundary
  • .*: Match anything remaining

Leave a Reply