Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex in .net seems to not work correctly

I want strip html from string with regular expression and while this regex works everywhere it does not work in .net I don’t understand why.

using System;
                    
public class Program
{
    public static void Main()
    {
        var text = "FOO <span style=\"mso-bidi-font-size:11.0pt;\nmso-fareast-language:EN-US\"> BAR";
        var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "");
        Console.WriteLine(res);
    }
}

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

You’re missing the correct Regex option:

var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "", RegexOptions.Singleline);

The reason you need this is because you have a newline (\n) in your HTML. Singleline will ensure that . even matches newline characters.

Docs blurb:

Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). For more information, see the "Single-line Mode" section in the Regular Expression Options article.

Docs

Try it online

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading