I have a string with Unicodes inside of it, and I am trying to extract each unicode from the overall string and save it to a list/array..
This is the overall string:
"test 🔷 test 💙 test 🔹"
I want the following list:
1. 🔷 2. 💙 3. 🔹
Right now I am trying the following:
string[] emojiSeparators = new string[] { "&#", ";" };
string[] resultEmojis;
resultEmojis = noHtmlEmoji.Split(
emojiSeparators, StringSplitOptions.RemoveEmptyEntries);
But I am getting the words "test" added to the list like below:
I only want the unicodes saved to my list, so that I can iterate over them and do things.
>Solution :
I suggest matching with a help of regular expression:
using System.Linq;
using System.Text.RegularExpressions;
...
string[] resultEmojis = Regex
.Matches(noHtmlEmoji, @"&#[1-9][0-9]{5}(?=;)")
.Cast<Match>()
.Select(match => match.Value)
.ToArray();
Pattern &#[1-9][0-9]{5}(?=;) explained:
&# - &# characters
[1-9] - digit in 1..9 range
[0-9]{5} - 5 digits in 0..9 range
(?=;) - ; character which is not included into the match
