I’m trying to replace any occurrence of a cwe.mitre.org.*.html (regex) URL and remove the .html extension and not change any other type of URL.
Example:
https://cwe.mitre.org/data/definitions/377.html
http://google.com/404.html
Expectation:
https://cwe.mitre.org/data/definitions/377
http://google.com/404.html
Is there a way to do this in sed or another tool?
I’ve tried sed -Ei 's/cwe.mitre.org.*.html/<REPLACEMENT>/g' file.txt, but that won’t work. Is there a way for the <REPLACEMENT> to be a regular expression? The sed manual doesn’t seem to suggest that?
EDIT: I was wrong about the sed manual. It does mention it, see "5.7 Back-references and Subexpressions" section of https://www.gnu.org/software/sed/manual/sed.html.
>Solution :
$ sed 's/\(cwe\.mitre\.org.*\)\.html/\1/' file
https://cwe.mitre.org/data/definitions/377
http://google.com/404.html
google sed capture groups.