Split text into list of words contains only letters

Advertisements

I’m trying to split a text into a list of words that a word should contain only letters.
I tried this pattern [^a-zA-Z]+ like below:

var regex = new Regex(@"[^a-zA-Z]+", RegexOptions.Singleline | RegexOptions.Compiled);
var words = regex.Split(text).Where(w => !string.IsNullOrEmpty(w))

When the input is This is a t3st, it returns ["This", "is", "a", "t", "st"] but I’m looking for ["This", "is", "a"] result.

I implemented it in this way:

 var words = text.Split(' ', StringSplitOptions.RemoveEmptyEntries)
                 .Where(str => str.All(char.IsLetter))
                 .ToList();

However, looking for a regex solution.

>Solution :

I don’t know C# in particular, but this should work (should be matched against the string):

(?<=^| )          # Beginning of line or preceding space
  (?:             # 
    (?=[a-z])     #                        ...which is a letter
    .             # Match any character...
  )+              # 1 or more times
(?= |$)           # End of line or succeeding space

Leave a Reply Cancel reply