dfl = [pd.DataFrame(
{
"A": 1.0,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": "foo",
}
)]
for df in dfl:
df = df[["A", "B"]]
print(dfl)
I was expecting the output has only column "A" and "B" since I was modifying the DataFrame in place (df = ...). However I got:
[ A B C D
0 1.0 2013-01-02 1.0 foo
1 1.0 2013-01-02 1.0 foo
2 1.0 2013-01-02 1.0 foo
3 1.0 2013-01-02 1.0 foo]
What is the reason and how can I select (not drop) some columns from the each DataFrame in that list in place?
>Solution :
Because df is a local pointer to the loop. By doing df = df[['A','B']], you tell the pointer to point to something new, which doesn’t override the existing element of the list. Another similar example:
ll = [[1]]
for l in ll:
l = None
print(ll)
# output: [[1]]
To override the element, you would want to do:
for i, d in enumerate(dfl):
dfl[i] = d[['A','B']]
print(dfl)
Out:
[ A B
0 1.0 2013-01-02
1 1.0 2013-01-02
2 1.0 2013-01-02
3 1.0 2013-01-02]