Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex match repeating patterns but not last occurrence

I would like to extract the following patterns:

  1. Initial by letter (Subgroup 1); and then

  2. followed by numbers of any length (Subgroup 2);

    MEDevel.com: Open-source for Healthcare and Education

    Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

    Visit Medevel

  3. followed by letter or any length (Subgroup 3);

  4. repeating 2 & 3 of any occurrences.

I am using https://regexr.com/ to test.

Here are some samples string and my expected output.

String: FAF46ABC7787AAAA  =>   Desired output: FAF46ABC7787

String: FAF46ABC7787      =>   Desired output: FAF46ABC

String: FAF46ABC          =>   Desired output: FAF46

String: FAF46             =>   Desired output: FAF

String: FAF               =>   Desired output: FAF

String: FAF46 GG(Not CC)  =>   Desired output: FAF

String: FAF46.doc         =>   Desired output: FAF

I tested the following but not working:

  1. Lookahead method suggested by

Python regex matching all but last occurrence

1a. ^([a-zA-Z]+)([0-9]*[a-zA-Z]*)(?=[0-9]+|[a-zA-Z]+)

1b. ^([a-zA-Z]+)(([0-9])*([a-zA-Z])*)(?=[0-9]+|[a-zA-Z]+)

  1. Capture all subgroups and exclude last occurrence by loop

2a. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*

  1. Using replace method

3a. (^(?:[a-zA-Z]+[0-9]*)(?:[a-zA-Z]+[0-9]*)*)([a-zA-Z]+|[0-9]+) and replace by $1

  1. Exclude ending occurrence by using non-capturing group

4a. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*(?:[0-9]+|[a-zA-Z]+)$

4b. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*(?:([0-9]+|[a-zA-Z]+))$

4c. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*(?:[0-9a-zA-Z]+)$

4d. ^([a-zA-Z]+)(([0-9]*?)([a-zA-Z]*?))*(?:[0-9a-zA-Z]+)$

I also change greedy or lazy to see if any miracles happen. But no luck.

I thought it should be easy task. But it is obvious that it is harder than what I thought.

I would appreciate for any kind of help.

Please note that I do not have extended regex if it is the case to work it work.
Thank you.

>Solution :

You can search using this regex:

^([a-zA-Z]+[0-9a-zA-Z]*?)(?:[0-9]+|[A-Z]*)\b.*

and replace with $1

RegEx Demo

RegEx Details:

  • ^: Start
  • (: Start capture group #1
    • [a-zA-Z]+: Match 1+ letters
    • [0-9a-zA-Z]*?: Match 0 or more letter or digits (non-greedy)
  • ): End 1st capture group
  • (?:: Start non-capture group
    • [0-9]+: Match 1+ digits
    • |: OR
    • [A-Z]*: Match 0 or more uppercase letters
  • ): End non-capture group
  • \b: Word boundary
  • .*: Match anything remaining
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading