Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Substr instr help needed in sql oracle

i am trying to fetch some text from HTML string from a table. the text are like this and i am trying to get the output

ID,TEXT
1,<font face="Microsoft Sans Serif" size="8" color="#000000">ABC 123</font>
2,<font face="Microsoft Sans Serif" size="8" color="#000000">1 ETG</font>
3,<font face="Microsoft Sans Serif" size="8" color="#000000">Gatelys PERC</font>
4,<font face="Microsoft Sans Serif" size="8" color="#000000">Gml TK 144</font>

ID,TEXT
1,ABC 123
2,1 ETG
3,Gatelys PERC
4,Gml TK 144

I have tried this but not getting right output.

select ID,regexp_substr(TEXT, '[^>]+',1,2) from TABLE
where TEXT is not null;

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

If your HTML is always valid XHTML then you should use XML functions to parse the string:

SELECT id,
       XMLQUERY('*/text()' PASSING XMLTYPE(text) RETURNING CONTENT) AS text
FROM   table_name;

Which, for the sample data:

CREATE TABLE table_name (id, text) AS
  Select 1, '<font face="Microsoft Sans Serif" size="8" color="#000000">ABC 123</font>' FROM DUAL UNION ALL
  Select 2, '<font face="Microsoft Sans Serif" size="8" color="#000000">1 ETG</font>' FROM DUAL UNION ALL
  Select 3, '<font face="Microsoft Sans Serif" size="8" color="#000000">Gatelys PERC</font>' FROM DUAL UNION ALL
  Select 4, '<font face="Microsoft Sans Serif" size="8" color="#000000">Gml TK 144</font>' FROM DUAL UNION ALL
  Select 5, '<font attr=">">XYZ</font>' FROM DUAL;

Outputs:

ID TEXT
1 ABC 123
2 1 ETG
3 Gatelys PERC
4 Gml TK 144
5 XYZ

If you want to naively remove all tags using string functions then you can use:

SELECT id,
       REGEXP_REPLACE(text, '<.*?>') AS text
FROM   table_name;

Which, for your sample data, outputs:

ID TEXT
1 ABC 123
2 1 ETG
3 Gatelys PERC
4 Gml TK 144
5 ">XYZ

Note, it will fail if you have attributes inside a tag that contain a > character.

If you want a more comprehensive query then:

SELECT id,
       REGEXP_REPLACE(text, q'{<([^"']+?|".*?"|'.*?')+?>}') AS text
FROM   table_name;

Which outputs the same as the XML query.

fiddle

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading