I am trying to create a regex that finds ticker symbols in bodies of text. However it is a bit of a struggle to get one to do everything I need.
Example:
This is a $test to show what I would LIKE to match. If $YOU look below you will FIND the list of simulated tickers ($STOck symbols) I would like to match.
So in this case I would like to match the following from the above:
- test
- LIKE
- YOU
- FIND
- STOCK
So as you can see I am trying to get everything after a "$" sign (not including the $) and if it is after the $ then I don’t care about case. Get anything that is in ALL CAPS and between 3-6 characters long. As well as have some room for mistakes $STock where (in this case) only the first two letter after the $ sign are capitals but I would like to match the whole thing before the next space.
I originally had \b[A-Z]{3,6}\b but that matches pretty much every word.
I tried to mix the above with something like: \$[^3-6\s]\S* but that includes the $ and also ignores any ALL CAPS without a dollar sign.
>Solution :
Would you please try the following:
import re
s = 'This is a $test to show what I would LIKE to match. If $YOU look below you will FIND the list of simulated tickers ($STOck symbols) I would like to match.'
print(re.findall(r'(?<=\$)\w+|[A-Z]{3,6}', s))
Output:
['test', 'LIKE', 'YOU', 'FIND', 'STOck']