I am trying to remove the backslashes in my dataset \; however, a simple string.replace() method will remove even the escape unicode strings and I don’t want that. I tried using re.sub("\\[^u]", " ", "\Not wanted backslashes\ unicode: \u2019\u2026"), but that also replaces the first character of the word.
Is there any way to only replace the backslash?
Thanks in advance
>Solution :
Easy. Use negative lookahead:
\\(?!u)
This pattern will match any backslash NOT followed by a u. But you can do even better, with a negative lookahead for a Unicode escape pattern:
\\(?!u[0-9A-Fa-f]{4})
This pattern will match any backslash NOT followed by a u + four hexadecimal digits.
To learn more: Positive & Negative Lookahead with Examples – Regex Tutorial