How do I replace missing values with NaN

May 6, 2022

I am using the IMDB dataset for machine learning, and it contains a lot of missing values which are entered as ‘\N’. Specifically in the StartYear column which contains the movie year release I want to convert the values to integers. Which im not able to do right now, I could drop these values but I wanted to see why they’re missing first. I tried several things but no success.

This is my latest attempt:

>Solution :

Here is a way to do it without using replace:

import pandas as pd
import numpy as np
df_basics = pd.DataFrame({'startYear':['\\N']*78760+[2017]*18267 + [2018]*18263+[2016]*17837+[2019]*17769+['1996 ','1993 ','2000 ','2019 ','2029 ']})
print(pd.value_counts(df_basics.startYear))
df_basics.loc[df_basics.startYear == '\\N','startYear'] = np.NaN
print(pd.value_counts(df_basics.startYear, dropna=False))

Output:

NaN      78760
2017     18267
2018     18263
2016     17837
2019     17769
1996         1
1993         1
2000         1
2019         1
2029         1

missing-data

byMR

Published May 06, 2022

Add a comment

Sort according to two columns and extract top two based on last column

byMR

May 6, 2022

Questions

Any efficient analogue of argsort for array of indices with NumPy?

byMR

May 6, 2022

Questions

vscode swagger go changed colors

byMR

May 6, 2022

Questions

Are there any good mappings between GCC and MSVC warnings? E.g. -Wredundant-move on MSVC

byMR

May 6, 2022

Questions

Change directory in Jupyter Lab not working

byMR

May 6, 2022

Questions

Get dput data assigned to variable

byMR

May 6, 2022

How do I replace missing values with NaN