Duplicating columns in pandas dataframe

January 4, 2022

I’m looking for a way to duplicate all columns in a dataframe, and have the duplicated column as the original name with a ‘_2’ on the end.

Example:

d = {'col1': [1, 2], 'col2': [3, 4]}
start_df = pd.DataFrame(data=d)

d2 = {'col1':[1,2],'col1_2':[1,2],'col2':[3,4],'col2_2':[3,4]}
end_df = pd.DataFrame(data=d2)

Thanks.

>Solution :

NB. this answer demonstrates a generalization of the process

Without any loop for generating the dataframe, you can simple use the repeat method of the columns index.

Then you can set columns names programmatically with a list comprehension.

For 2 repeats:

end_df = start_df[start_df.columns.repeat(2)]
end_df.columns = [f'{a}{b}' for a in start_df for b in ('', '_2')]

output:

   col1  col1_2  col2  col2_2
0     1       1     3       3
1     2       2     4       4

Generalization:

n = 5

end_df = start_df[start_df.columns.repeat(n)]
end_df.columns = [f'{a}{b}' for a in start_df
                            for b in ['']+[f'_{x+1}' for x in range(1,n)]]

Example n=5:

   col1  col1_2  col1_3  col1_4  col1_5  col2  col2_2  col2_3  col2_4  col2_5
0     1       1       1       1       1     3       3       3       3       3
1     2       2       2       2       2     4       4       4       4       4