In the following code, I have one DataFrame with two rows and a series with two values.
I would like to set the Series values in the column of my DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(2, 1), index=["one", "two"])
print(df)
s = pd.Series(np.random.randn(2), index=["four", "five"])
df.loc[:, 0] = s
print(df)
However, the Series and the Dataframe doesn’t have the same index. This results in NaNs in the Dataframe.
0
one NaN
two NaN
In order to have my values in the column, I can simply use the .values attribute of s.
df.loc[:, 0] = s.values
I would like to understand what is the logic behind getting NaNs when doing the former.
>Solution :
Before adding values to a Series/column, pandas aligns the indices.
This enables you to assign data when indices are missing or not in the same order.
For example:
df = pd.DataFrame(np.random.randn(2, 1), index=["one", "two"])
s = pd.Series([2, 1], index=["two", "one"]) # notice the different order
df.loc[:, 0] = s
print(df)
0
one 1
two 2
You can check what should happen using reindex:
s = pd.Series(np.random.randn(2), index=["four", "five"])
s.reindex(df.index)
one NaN
two NaN
dtype: float64
Using values/to_numpy(), this converts the Series to numpy array and reindexing is no longer performed.