Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

jq: iterate over regex matches in a string

I’m reworking some json using jq and trying to extract some strings from a larger description and move them into an array of related controls.

Here’s my input json:

{"description": "Fail-safe procedures include, for example, alerting operator personnel and providing specific instructions on subsequent steps to take (e.g., do nothing, re-establish system settings, shut down processes, restart the system, or contact designated organizational personnel). Related controls: CA-2, CA-7, CM-3, CM-5, CM-8, MA-2, IR-4, RA-5, SA-10, SA-1x, SI-1x"}

The output I want is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

{"description": "Fail-safe procedures include, for example, alerting operator personnel and providing specific instructions on subsequent steps to take (e.g., do nothing, re-establish system settings, shut down processes, restart the system, or contact designated organizational personnel).",
"relatedControls": ["CA-2", "CA-7", "CM-3", "CM-5", "CM-8", "MA-2", "IR-4", "RA-5", "SA-10", "SA-1x", "SI-1x"}

I’ve worked out something I think is pretty close, but this is creating more objects instead of creating an array of controls like I wanted.

jq '. | {description: .description | sub(" Related controls:.*";""), relatedControls: .description | scan("[A-Z]{2}-\\d[0-9x]?") }'

Here’s the whole thing on one line so it’s easy to test:

echo '{"description": "Fail-safe procedures include, for example, alerting operator personnel and providing specific instructions on subsequent steps to take (e.g., do nothing, re-establish system settings, shut down processes, restart the system, or contact designated organizational personnel). Related controls: CA-2, CA-7, CM-3, CM-5, CM-8, MA-2, IR-4, RA-5, SA-10, SA-1x, SI-1x"}' | jq '. | {description: .description | sub(" Related controls:.*";""), relatedControls: .description | scan("[A-Z]{2}-\\d[0-9x]?") }'

jq wizards… what a I missing to get the output I’m after?

>Solution :

You could just split / at " Related controls: ", then split again at ", ":

.description / " Related controls: "
| {description: .[0], relatedControls: (.[1] / ", ")}

Alternatively, here’s another approach using capture and scan with your regular expressions:

.description
| capture("(?<description>.*) Related controls: (?<relatedControls>.*)")
| .relatedControls |= [scan("[A-Z]{2}-\\d[0-9x]?")]

Output:

{
  "description": "Fail-safe procedures include, for example, alerting operator personnel and providing specific instructions on subsequent steps to take (e.g., do nothing, re-establish system settings, shut down processes, restart the system, or contact designated organizational personnel).",
  "relatedControls": [
    "CA-2",
    "CA-7",
    "CM-3",
    "CM-5",
    "CM-8",
    "MA-2",
    "IR-4",
    "RA-5",
    "SA-10",
    "SA-1x",
    "SI-1x"
  ]
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading