I am struggling to understand how pd.concat works when the input is a dictionary.
Let’s say we have the following pandas dataframe –
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Age'])
Then, we do the following concatenation operation –
z = pd.concat({"z":df},
axis = 1)
print(z)
The output comes out to be –
z
Name Age
0 tom 10
1 nick 15
2 juli 14
It seems like the key z was stacked on top of the dataframe df. But this doesn’t make sense as the axis specified was 1 and therefore, the stacking (if that’s what occurred) should’ve been across columns.
>Solution :
It actually makes sense, since you concatenate as columns (axis=1) you need to differentiate the concatenated columns.
Here is a more meaningful example:
out = pd.concat({'left': df.add_prefix('left_'),
'middle': df.add_prefix('middle_'),
'right': df.add_prefix('right_')},
axis=1)
left middle right
left_Name left_Age middle_Name middle_Age right_Name right_Age
0 tom 10 tom 10 tom 10
1 nick 15 nick 15 nick 15
2 juli 14 juli 14 juli 14
This is equivalent to passing the new names to keys:
out = pd.concat([df.add_prefix('left_'),
df.add_prefix('middle_'),
df.add_prefix('right_')],
keys=['left', 'middle', 'right'],
axis=1)
If you were concatenating on axis=0 (rows), then concat would prefix an index level:
out = pd.concat({'top': df,
'middle': df,
'bottom': df},
axis=0)
Name Age
top 0 tom 10
1 nick 15
2 juli 14
middle 0 tom 10
1 nick 15
2 juli 14
bottom 0 tom 10
1 nick 15
2 juli 14