Home Normalize string from webpage

Questions

Normalize string from webpage

October 3, 2022

Trying to normalize the string "PartII\xa0I \x96 FINANCIAL\n INFORMATION". In general, all that should be left (once non utf-8 characters are excluded) are letters, numbers and dots. Therefore the expected output is "PartII FINANCIAL INFORMATION". The text comes from this Sec form.

Solutions tried, where text is the string:

text.encode('utf-8', errors='ignore').decode('utf-8')
unicodedata.normalize(decoding, text)

>Solution :

Use this it will work for you:

text.encode('ascii', errors='ignore').decode('utf-8')

also if you need to remove \n use this:

text.replace('\n', "").encode('ascii', errors='ignore').decode('utf-8')

encoding

byMR

Published October 03, 2022

Add a comment

Knockout text – background image cut by viewport width – font-size 100vh

byMR

October 3, 2022

Questions

How to replace letters of work with reference table column

byMR

October 3, 2022

Questions

Regex get string between intervals underscores

byMR

October 3, 2022

Questions

How to upgrade PowerShell version

byMR

October 3, 2022

Questions

convert pandas dataframe multi values column into separate rows

byMR

October 3, 2022

Questions

speeding up a double loop on pandas' date frame

byMR

October 3, 2022

Normalize string from webpage

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Knockout text – background image cut by viewport width – font-size 100vh

How to replace letters of work with reference table column

Regex get string between intervals underscores

How to upgrade PowerShell version

convert pandas dataframe multi values column into separate rows

speeding up a double loop on pandas' date frame

Keep Up to Date with the Most Important News

Normalize string from webpage

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Knockout text – background image cut by viewport width – font-size 100vh

How to replace letters of work with reference table column

Regex get string between intervals underscores

How to upgrade PowerShell version

convert pandas dataframe multi values column into separate rows

speeding up a double loop on pandas' date frame

Discover more from Dev solutions