Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to decompose twitter hashtags into words?

I’m trying to decompose twitter hashtags in order to extract the words that compose it. I’m having trouble finding a regular expression that can do this satisfactorily, mainly due to the authors’ "excessive creativity" in capitalization.

Some examples:

#itsAHashtag -> ['its', 'a', 'hashtag']
#GlazersOutNOW -> ['glazers', 'out', 'now']
#COVIDIsNotOver -> ['covid', 'is', 'not', 'over']

Is there any library that does this kind of decomposition?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Based upon the samples you provided, this regex should work for you,

(?:[A-Z]+|[a-zA-Z][a-z]+?)(?=[A-Z]|$)

Check this demo

And let me know if this works. I will add explanation if it works well.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading