I have a csv file that looks like this:
table = {'column1': [1,2,3],
'column2': ['(0.2, 0.02, NaN)','(0.0, 0.03, 0)','(0.1, NaN, 1)']}
df = pd.DataFrame(table)
I am trying to access to the array that is stored in "column2", however pandas says that "column2" is an object and therefore if I print df[‘column2’][0][0], I get ‘(‘ instead of "0.2".
How can I change the data type from "object" to numeric values?
I tried this
pd.to_numeric(df['column2'][0])
but it didn’t work.
>Solution :
eval and ast.literal_eval won’t work as the string NaN does not mean anything in Python without context (ofcoruse it’s np.nan – but the module ast isn’t aware of that)
So you can change NaNs to None for a moment, then apply ast.literal_eval or eval then convert Nones to np.nan:
import ast
df['column2'] = df['column2'].str.replace('NaN', 'None').apply(ast.literal_eval).apply(lambda x: tuple(np.nan if val is None else val for val in x))
and
df['column2'] = df['column2'].str.replace('NaN', 'None').apply(eval).apply(lambda x: tuple(np.nan if val is None else val for val in x))
Shorter version would be to replace NaN with np.nan and give it the Numpy module for context:
df['column2']=df['column2'].str.replace('NaN', 'np.nan').apply(eval)
If you don’t want to use the ast module.
In [98]: df['column2'][0][0]
Out[98]: 0.2
In [100]: type(df['column2'][0])
Out[100]: tuple