I’ve been reading about code injection using unicode sequences and have been using a tool from Dotnetsafer to locate sequences in a codebad I’ve inherited. This sequence \uD83D\uDCCC keeps coming up:
An example:
appears as: [588] __builder5.AddMarkupContent(51, "??");
actual : [588] __builder5.AddMarkupContent(51, "\uD83D\uDCCC");
What is this sequence? Why would the code be injecting it into HTML?
EDIT 1: I’ve looked up the sequence and the only thing remotely useful that I’ve found is https://unicode.scarfboy.com/?s=D83D+DCCC
>Solution :
Those are the UTF-16 code units that encode the Unicode character U+1F4CC (the pushpin emoji 📌).
How could you have found out?
- Look up U+D83D and U+DCCC and find out that they are not actual Unicode characters, but high and low surrogates respectively, meaning they are used in UTF-16
- Google for "D83D DCCC" and find this page which explicitly lists those as the UTF-16 encoding of the pushpin emoji.
Actually, come to think of it, you could just skip step #1 😉