One Column of my dataset is like this:
0 10,000+
1 500,000+
2 5,000,000+
3 50,000,000+
4 100,000+
Name: Installs, dtype: object
and I want to change these ‘xxx,yyy,zzz+’ strings to integers.
first I tried this function:
df['Installs'] = pd.to_numeric(df['Installs'])
and I got this error:
ValueError: Unable to parse string "10,000" at position 0
and then I tried to remove ‘+’ and ‘,’ with this method:
df['Installs'] = df['Installs'].str.replace('+','',regex = True)
df['Installs'] = df['Installs'].str.replace(',','',regex = True)
but nothing changed!
How can I convert these strings to integers?
>Solution :
+ is not a valid regex, use:
df['Installs'] = pd.to_numeric(df['Installs'].str.replace(r'\D', '', regex=True))