I have a dataframe as follows:
import pandas as pd
df = pd.DataFrame({"A_1_a.1": [3],"A_1_a.2": [4],"A_1_a.3": [1],
"A_2_a.1": [2],"A_2_a.2": [3], "A_2_a.3": [7],"A_3_a.1": [4],
"A_3_a.2": [2],"A_3_a.3": [9]})
The first number in the header is the ‘relation number’ and the last number in each header is the ‘attribute’ number. I would like to join the columns in a way that I would have the following output. but i don’t know how to proceed.
desired output:
attribute_num A_1_a A_2_a A_3_a
0 1 3 2 4
1 2 4 3 2
2 3 1 7 9
I have tried the following but I know that it returns an error
df.groupby(df['A_1_a.1', 'A_1_a.2', 'A_1_a.3', 'A_2_a.1',
'A_2_a.2', 'A_2_a.3','A_3_a.1', 'A_3_a.2','A_3_a.3'].str[2]).reset_index(name='new')
>Solution :
You can split the column index to create a MultiIndex and stack:
df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.map(lambda x: x.split('.'))))
df.stack(level=1).rename_axis((None, 'attribute_num')).reset_index(level=1)
Output:
attribute_num A_1_a A_2_a A_3_a
0 1 3 2 4
0 2 4 3 2
0 3 1 7 9