Consider the following code:
import pandas as pd
import random
data = {'col1': [random.randint(0, 100) for _ in range(5)],
'col2': [random.randint(0, 100) for _ in range(5)]}
df = pd.DataFrame(data)
lista = []
df['test'] = None
for index, row in df.iterrows():
lista.append([random.randint(0, 100) for _ in range(5)])
df.at[index, 'test'] = lista
print(index,lista)
display(df)
Why the final output shows the last iterated list always? I mean, since df.at updates a value at an index and my index is serial (using iterrows) why the output is [[first list], [second list], [third list], [fourth list], [fifth list]] in all rows?
My desired output is:
test
[[first list]]
[[first list], [second list]]
[[first list], [second list], [third list]]
[[first list], [second list], [third list], [fourth list]]
[[first list], [second list], [third list], [fourth list], [fifth list]]
>Solution :
You’re seeing the undesired result because you’re updating the same list lista in every iteration. Instead, assign a copy of the list to a dataframe:
import random
import pandas as pd
data = {
"col1": [random.randint(0, 100) for _ in range(5)],
"col2": [random.randint(0, 100) for _ in range(5)],
}
df = pd.DataFrame(data)
lista = []
df["test"] = None
for index, row in df.iterrows():
lista.append([random.randint(0, 100) for _ in range(5)])
df.at[index, "test"] = lista.copy() # <-- copy a list here!
print(index, lista)
print(df)
Prints:
col1 col2 test
0 90 55 [[47, 70, 98, 43, 95]]
1 74 16 [[47, 70, 98, 43, 95], [4, 86, 69, 84, 51]]
2 45 90 [[47, 70, 98, 43, 95], [4, 86, 69, 84, 51], [81, 74, 33, 100, 77]]
3 77 56 [[47, 70, 98, 43, 95], [4, 86, 69, 84, 51], [81, 74, 33, 100, 77], [34, 47, 85, 4, 74]]
4 49 43 [[47, 70, 98, 43, 95], [4, 86, 69, 84, 51], [81, 74, 33, 100, 77], [34, 47, 85, 4, 74], [70, 29, 25, 47, 99]]