Normalize string from webpage
Trying to normalize the string "PartII\xa0I \x96 FINANCIAL\n INFORMATION". In general, all that should be left (once non utf-8 characters are excluded) are letters, numbers and dots. Therefore the expected output is "PartII FINANCIAL INFORMATION". The text comes from this Sec form. Solutions tried, where text is the string: text.encode(‘utf-8′, errors=’ignore’).decode(‘utf-8’) unicodedata.normalize(decoding, text) >Solution :… Read More Normalize string from webpage