Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Java Regex -> how to make WHOLE lookahead lazy

I am tackling a problem which has probably easy solution but I just can’t think of it…
I’ve got an XML file input, just for the sake of testing I’ve put one structure below.

My goal: for input String s; s == value of ‘name’ attribute of the <con:testSuite> element, (let’s choose SUITE2)

I want to match each value of ‘name’ in <con:testCase> element, but just within testCase elements inside of the chosen testSuite element.

My regex(regex101 testing):

(?<=<con:testSuite[\d\D]{1,60}name=\"SUITE2\")(?:[\d\D]*?<con:testCase[\d\D]*?name=\")(.*?)(?:\")(?=[\d\D]*?</con:testSuite)

In Java:
Pattern.compile("(?<=<con:testSuite[\\d\\D]{1,60}name=\\\"SUITE2\\\")(?:[\\d\\D]*?<con:testCase[\\d\\D]*?name=\\\")(.*?)(?:\\\")(?=[\\d\\D]*?</con:testSuite)")

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Now this regex returns just value of the name in first testCase.. if I remove first lazy in first non-matching group, then just a last one… however as I read it, without lazy it would make sense to me to match each testCase’s name’s value.

…Anyway I moved on and since this is not by any means a production code, just my utility tool, and I can guess the max chars between checkpoints in xml, I’ve chosen to move non capturing group to be a part of lookbehind (and make it fixed length ofc)

(?<=<con:testSuite[\d\D]{1,60}name=\"SUITE2\"[\d\D]{1,60000}<con:testCase[\d\D]{1,50000}name=\")(.*?)(?:\")(?=[\d\D]*?</con:testSuite)

Now this does the magic in terms of finding all values, however it still got one issue, that being a lookahead -> I’ve got [\d\D]* with lazy, yet it ignores first occurence of the </con:testSuite and matches the last possible, therefore it does not fit my condition about values just within the one chosen con:testSuite element… and my Xmas-mooded mind just cannot fight this 🙂

Sorry for the long post, any help is appreciated <3

-for SUITE2 chosen, desired matches[]=["55555","66666","77777","88888"]

-testing xml structure:

<con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE1" disabled="true">
  <con:testCase seOnErrors="true" name="44444" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="33333" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="22222" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="11111" searchProp>sdfsddsfsdsd
  </con:testCase>
</con:testSuite>
<con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE2" disabled="true">
  <con:testCase seOnErrors="true" name="55555" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="66666" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="77777" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="88888" searchProp>sdfsddsfsdsd
  </con:testCase>
</con:testSuite>
<con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE3" disabled="true">
  <con:testCase seOnErrors="true" name="99999" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="0000" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="11221122" searchProp>sdfsddsfsdsd
  </con:testCase>
  <con:testCase seOnErrors="true" name="33443344" searchProp>sdfsddsfsdsd
  </con:testCase>
</con:testSuite>

>Solution :

Your regex attempt is certainly formidable but this question is a poster-child for precisely when to not use regex. XPath is the right tool.

See XPath below:

//con:testSuite[@name='SUITE2']/con:testCase/@name

Try it out yourself at https://www.freeformatter.com/xpath-tester.html

Just make sure to namespace the XML data properly:

<?xml version="1.0" encoding="UTF-8"?>
<con:test xmlns:con="http://www.example.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="./example1.xsd">
   <con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE1" disabled="true">
      <con:testCase seOnErrors="true" name="44444" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="33333" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="22222" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="11111" searchProp="">sdfsddsfsdsd</con:testCase>
   </con:testSuite>
   <con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE2" disabled="true">
      <con:testCase seOnErrors="true" name="55555" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="66666" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="77777" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="88888" searchProp="">sdfsddsfsdsd</con:testCase>
   </con:testSuite>
   <con:testSuite id="dd1107cb-2f4c-47bd-8af5-f64e0d20354b" name="SUITE3" disabled="true">
      <con:testCase seOnErrors="true" name="99999" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="0000" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="11221122" searchProp="">sdfsddsfsdsd</con:testCase>
      <con:testCase seOnErrors="true" name="33443344" searchProp="">sdfsddsfsdsd</con:testCase>
   </con:testSuite>
</con:test>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading