Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas: join columns if their headers contains a specific string

I have a dataframe as follows:

import pandas as pd
df = pd.DataFrame({"A_1_a.1": [3],"A_1_a.2": [4],"A_1_a.3": [1],
"A_2_a.1": [2],"A_2_a.2": [3], "A_2_a.3": [7],"A_3_a.1": [4],
"A_3_a.2": [2],"A_3_a.3": [9]})

The first number in the header is the ‘relation number’ and the last number in each header is the ‘attribute’ number. I would like to join the columns in a way that I would have the following output. but i don’t know how to proceed.

desired output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  attribute_num   A_1_a   A_2_a  A_3_a
0     1              3      2      4        
1     2              4      3      2
2     3              1      7      9

I have tried the following but I know that it returns an error

df.groupby(df['A_1_a.1', 'A_1_a.2', 'A_1_a.3', 'A_2_a.1',
          'A_2_a.2', 'A_2_a.3','A_3_a.1', 'A_3_a.2','A_3_a.3'].str[2]).reset_index(name='new')

>Solution :

You can split the column index to create a MultiIndex and stack:

df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.map(lambda x: x.split('.'))))
df.stack(level=1).rename_axis((None, 'attribute_num')).reset_index(level=1)

Output:

  attribute_num  A_1_a  A_2_a  A_3_a
0             1      3      2      4
0             2      4      3      2
0             3      1      7      9
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading