Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Duplicating columns in pandas dataframe

I’m looking for a way to duplicate all columns in a dataframe, and have the duplicated column as the original name with a ‘_2’ on the end.

Example:

d = {'col1': [1, 2], 'col2': [3, 4]}
start_df = pd.DataFrame(data=d)

d2 = {'col1':[1,2],'col1_2':[1,2],'col2':[3,4],'col2_2':[3,4]}
end_df = pd.DataFrame(data=d2)

Thanks.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

NB. this answer demonstrates a generalization of the process

Without any loop for generating the dataframe, you can simple use the repeat method of the columns index.

Then you can set columns names programmatically with a list comprehension.

For 2 repeats:

end_df = start_df[start_df.columns.repeat(2)]
end_df.columns = [f'{a}{b}' for a in start_df for b in ('', '_2')]

output:

   col1  col1_2  col2  col2_2
0     1       1     3       3
1     2       2     4       4

Generalization:

n = 5

end_df = start_df[start_df.columns.repeat(n)]
end_df.columns = [f'{a}{b}' for a in start_df
                            for b in ['']+[f'_{x+1}' for x in range(1,n)]]

Example n=5:

   col1  col1_2  col1_3  col1_4  col1_5  col2  col2_2  col2_3  col2_4  col2_5
0     1       1       1       1       1     3       3       3       3       3
1     2       2       2       2       2     4       4       4       4       4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading