I want strip html from string with regular expression and while this regex works everywhere it does not work in .net I don’t understand why.
using System;
public class Program
{
public static void Main()
{
var text = "FOO <span style=\"mso-bidi-font-size:11.0pt;\nmso-fareast-language:EN-US\"> BAR";
var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "");
Console.WriteLine(res);
}
}
>Solution :
You’re missing the correct Regex option:
var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "", RegexOptions.Singleline);
The reason you need this is because you have a newline (\n) in your HTML. Singleline will ensure that . even matches newline characters.
Docs blurb:
Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). For more information, see the "Single-line Mode" section in the Regular Expression Options article.