Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using flex to identify variable name without repeating characters

I’m not fully sure how to word my question, so sorry for the rough title.

I am trying to create a pattern that can identify variable names with the following restraints:

  • Must begin with a letter
  • First letter may be followed by any combination of letters, numbers, and hyphens
  • First letter may be followed with nothing
  • The variable name must not be entirely X’s ([xX]+ is a seperate identifier in this grammar)

So for example, these would all be valid:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • Avariable123
  • Bee-keeper
  • Y
  • E-3

But the following would not be valid:

  • XXXX
  • X
  • 3variable
  • 5

I am able to meet the first three requirements with my current identifier, but I am really struggling to change it so that it doesn’t pick up variables that are entirely the letter X.

Here is what I have so far: [a-z][a-z0-9\-]* {return (NAME);}

Can anyone suggest a way of editing this to avoid variables that are made up of just the letter X?

>Solution :

The easiest way to handle that sort of requirement is to have one pattern which matches the exceptional string and another pattern, which comes afterwards in the file, which matches all the strings:

[xX]+                    { /* matches all-x tokens */ }
[[:alpha:]][[:alnum:]-]* { /* handle identifiers */ }

This works because lex (and almost all lex derivatives) select the first match if two patterns match the same longest token.

Of course, you need to know what you want to do with the exceptional symbol. If you just want to accept it as some token type, there’s no problem; you just do that. If, on the other hand, the intention was to break it into subtokens, perhaps individual letters, then you’ll have to use yyless(), and you might want to switch to a new lexing state in order to avoid repeatedly matching the same long sequence of Xs. But maybe that doesn’t matter in your case.

See the flex manual for more details and examples.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading