I would like to extract the following patterns:
-
Initial by letter (Subgroup 1); and then
-
followed by numbers of any length (Subgroup 2);
-
followed by letter or any length (Subgroup 3);
-
repeating 2 & 3 of any occurrences.
I am using https://regexr.com/ to test.
Here are some samples string and my expected output.
String: FAF46ABC7787AAAA => Desired output: FAF46ABC7787
String: FAF46ABC7787 => Desired output: FAF46ABC
String: FAF46ABC => Desired output: FAF46
String: FAF46 => Desired output: FAF
String: FAF => Desired output: FAF
String: FAF46 GG(Not CC) => Desired output: FAF
String: FAF46.doc => Desired output: FAF
I tested the following but not working:
- Lookahead method suggested by
Python regex matching all but last occurrence
1a. ^([a-zA-Z]+)([0-9]*[a-zA-Z]*)(?=[0-9]+|[a-zA-Z]+)
1b. ^([a-zA-Z]+)(([0-9])*([a-zA-Z])*)(?=[0-9]+|[a-zA-Z]+)
- Capture all subgroups and exclude last occurrence by loop
2a. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*
- Using replace method
3a. (^(?:[a-zA-Z]+[0-9]*)(?:[a-zA-Z]+[0-9]*)*)([a-zA-Z]+|[0-9]+)
and replace by $1
- Exclude ending occurrence by using non-capturing group
4a. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*(?:[0-9]+|[a-zA-Z]+)$
4b. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*(?:([0-9]+|[a-zA-Z]+))$
4c. ^([a-zA-Z]+)(([0-9]*)([a-zA-Z]*))*(?:[0-9a-zA-Z]+)$
4d. ^([a-zA-Z]+)(([0-9]*?)([a-zA-Z]*?))*(?:[0-9a-zA-Z]+)$
I also change greedy or lazy to see if any miracles happen. But no luck.
I thought it should be easy task. But it is obvious that it is harder than what I thought.
I would appreciate for any kind of help.
Please note that I do not have extended regex if it is the case to work it work.
Thank you.
>Solution :
You can search using this regex:
^([a-zA-Z]+[0-9a-zA-Z]*?)(?:[0-9]+|[A-Z]*)\b.*
and replace with $1
RegEx Details:
^
: Start(
: Start capture group #1[a-zA-Z]+
: Match 1+ letters[0-9a-zA-Z]*?
: Match 0 or more letter or digits (non-greedy)
)
: End 1st capture group(?:
: Start non-capture group[0-9]+
: Match 1+ digits|
: OR[A-Z]*
: Match 0 or more uppercase letters
)
: End non-capture group\b
: Word boundary.*
: Match anything remaining