I have a pandas data frame:
import pandas as pd
X = pd.DataFrame({'col1': [1,2],
'col2': [4,5]})
I have a replacement dictionary:
dict_replace = {
'col1': {1:'a', 2:'b'},
'col2': {4:'c', 5:'d'}
}
I can easily replace the values in X using:
X = X.replace(dict_replace)
Resulting in:
X = pd.DataFrame({'col1': ['a','b'],
'col2': ['c','d']})
However, if a new value appears in X which is not in dict_replace (for the respective column) I want it replaced with np.nan.
For example, a data frame:
X = pd.DataFrame({'col1': [1,2,3],
'col2': [4,5,7]})
Should look like:
X = pd.DataFrame({'col1': ['a','b',np.nan],
'col2': ['c','d',np.nan]})
What are some ways I can do this without having to iterate?
>Solution :
You are looking for pandas.Series.map, which, though only available on columns, can be used on the whole dataframe with apply:
X = X.apply(lambda col: col.map(dict_replace[col.name]))
Output:
>>> X
col1 col2
0 a c
1 b d
2 NaN NaN