Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Faster method to remove non-letter characters from string

I want to remove all characters from string except Unicode letters.

I consider using this code:

public static string OnlyLetters(string text)
{
    return new string (text.Where(c => Char.IsLetter(c)).ToArray());
}

But maybe Regex will be faster?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

public static string OnlyLetters(string text)
{
    Regex rgx = new Regex("[^\p{L}]");
    return rgx.Replace(text, "");
}

Could you verify these codes and suggest which one should I choose?

>Solution :

If you want to know which horse is faster, you can perform races:

Often, manual manipulations appear to be fast, let’s try this approach:

    private static string ManualReplace(string value) {
      // let's allocate memory only once - value.Length characters
      StringBuilder sb = new StringBuilder(value.Length);

      foreach (char c in value)
        if (char.IsLetter(c))
          sb.Append(c);

      return sb.ToString();
    }

Races:

      // 123 - seed - in order text to be the same
      Random random = new Random(123);

      // let's compile the regex
      Regex rgx = new Regex(@"[^\p{L}]", RegexOptions.Compiled);
      string result = null; // <- makes compiler to be happy

      string text = string.Concat(Enumerable
        .Range(1, 10_000_000)
        .Select(_ => (char)random.Next(32, 128)));

      Stopwatch sw = new Stopwatch();

      // warming: let .net compile IL, fill caches, allocate memory etc.
      int warming = 5;

      for (int i = 0; i < warming; ++i) {
        if (i == warming - 1)
          sw.Start(); 

        // result = new string(text.Where(c => char.IsLetter(c)).ToArray());

        result = rgx.Replace(text, "");

        // result = string.Concat(text.Where(c => char.IsLetter(c)));

        // result = ManualReplace(text);

        if (i == warming - 1)
          sw.Stop();
      }

      Console.WriteLine($"{sw.ElapsedMilliseconds}");

Run this several time and you’ll get the results. Mine (.net 6, Release) are

new string    : 120 ms
rgx.Replace   : 350 ms
string.Concat : 150 ms
Manual        :  80 ms

So we have the winner, it’s Manual replace; among the others new string (text.Where(c => Char.IsLetter(c)).ToArray()); is the fastest, string.Concat is slightly slower and Regex.Replace is a loser.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading