Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Java regex libraries generating incorrect results?

Hi I am having a sample program that uses two different java regex libraries namely generex & xeger.

my sample program generate string from a regex pattern . the pattern i use in my program is

^([0-9]{5,6}-)?[^-]+$

The following is my sample program.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import com.mifmif.common.regex.Generex;

import nl.flotsam.xeger.Xeger;

public class PatternGenerator {

    public static void main(String[] args) {

        Xeger x = new Xeger("^([0-9]{5,6}-)?[^-]+$");

        for (int i = 0; i < 3; i++) {
            System.out.println("Xeger: " + x.generate());
        }

        Generex g = new Generex("^([0-9]{5,6}-)?[^-]+$");

        for (int i = 0; i < 3; i++) {
            System.out.println("Generex:" + g.random());
        }
    }
}

I receive the following output.

Xeger: ^"믟ꍥ잲$'涢$$
Xeger: ^츣$()'氷,%*$,䷝(궞ᴸ+$娐⮁$$+")%予&,$
Xeger: ^4#+妡,䯒+醁꣡(킒)($
Generex:^㬹)$댮$+$(((ⷠ(玖㐳+它$$$
Generex:^蝙$
Generex:^3/ⸯ꫰$$$(&$

unfortunately the output is not readable. If i provide the regex to some online generators i get different output .
For example imagine if use https://www.browserling.com/tools/text-from-regex i get the followign output.

LUPK*WqG)e8Od_LYtKq;Wp:N+&sy>]sGSt[&sj>r|6HQBr)|W<IDy'CeY
96817-ie;Y~Mb@673#Y2e:vlGXDz5\AjyLE4hdqpu;^sqY7ziyYCF,,A5]}n;@4.\4\~`
590766-yAVPh1,fe&>uc*WA2s,
T1'K.skX~[e#$dK'SubJ
06278->THw_YTnH`n"?Jf1n}"v<<xy1SCeQ/WF%G(tZ(VD_J,t1YrQ,TZ@{k

in my maven pom.xml I am using the generex and xeger dependencies.

   <dependency>
      <groupId>com.github.mifmif</groupId>
      <artifactId>generex</artifactId>
      <version>1.0.2</version>
    </dependency>
    
    <!-- https://mvnrepository.com/artifact/com.github.krraghavan/xeger -->
   <dependency>
      <groupId>com.github.krraghavan</groupId>
      <artifactId>xeger</artifactId>
      <version>1.0.0-RELEASE</version>
   </dependency>

appreciate if you can help me to explain why the output of my program is unreadable.

>Solution :

If we take your expression ^([0-9]{5,6}-)?[^-]+$ apart

^ – beginning of line – OK
([0-9]{5,6}-)? optional block of 5 or 6 digits followed be a hyphen: this is being used by the generator
[^-]+ any character except hypen: this allows any of literally thousands of characters so the proportion that are part of the few hundred that are readable being included is relatively small. If you look there are some ‘readable’ characters.
$ – end of line – OK

You might wish to modify your regex to

^([0-9]{5,6}-)?(?!-([\w\s]))+$
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading