How to Store correlation matrix's values in dataframe

October 20, 2022

I have a df. I want to apply a correlation matrix between the columns and store the result values in another dataframe, however, I do not want to keep the upper part of the matrix, which is useless given that it’s duplicate from the other part, I also don’t want the diagonal (1,1,1,1). So i applied a mask, and then append the result to a dataframe, It

my basic df :

	A	B	C	name
0	0.5	0	8	test1
1	5	1	5	test2
2	3	2	6	test3
3	6	3	10	test4
4	18	14	12	test5

And I would like to do the correlation between those parameters and return the result in another dataframe, but only keeping the upper part to not have the duplicate values. The code is :

dft=pd.DataFrame()
for i in df['name']:
    corrmat = df[df['name']==i].corr()
    corr_triu = corrmat.where(~np.tril(np.ones(corrmat.shape)).astype(np.bool))
    corr_triu = corr_triu.stack()
    corr_triu["name","name"]=i
    dft=dft.append(corr_triu,ignore_index=True)

and that’s the result of my code :

	(A,B)	(B,C)	(C,A)	(name,name)
0	0.5	0.2	0.2	test1
1	0.6	0.8	0.3	test2
2	0.7	0.23	0.25	test3
3	0.5	0.9	0.65	test1
4	0.6	0.4	0.12	test2

then I am trying to use .melt in order to restructure the dataframe but I can’t do it :

dft.melt(id_vars=['(name, name)'], 
        var_name="Var1_Var2", 
        value_name="Value")

the error is :

KeyError: "The following ‘id_vars’ are not present in the DataFrame:
[‘(name, name)’]"

but when I looked for the column name got :

Index([('A', 'B'),
       ('A', 'C'),
       ('B', 'C'),
       ('name', 'name')], dtype='object')

It seems weird that the name is not found, I do not get it.

Here is what I would like to get as a result :

	Var1_Var2	Value	name
0	(A,B)	0.2	test1
1	(A,B)	0.8	test2
2	(A,B)	0.23	test3
3	(B,C)	0.9	test1
4	(B,C)	0.4	test2
5	(B,C)	0.4	test3
6	(C,A)	0.4	test1
7	(C,A)	0.4	test2
8	(C,A)	0.4	test3

There is probably an easier way leading to the result but I can’t find it

>Solution :

Tuples in columns are problematic, here is solution for change tuple for scalar name:

dft.columns = ['name' if x == ('name', 'name') else x for x in dft.columns]

df = dft.melt(id_vars='name', 
          var_name="Var1_Var2", 
          value_name="Value")
print (df)
     name Var1_Var2  Value
0   test1    (A, B)   0.50
1   test2    (A, B)   0.60
2   test3    (A, B)   0.70
3   test1    (A, B)   0.50
4   test2    (A, B)   0.60
5   test1    (A, C)   0.20
6   test2    (A, C)   0.80
7   test3    (A, C)   0.23
8   test1    (A, C)   0.90
9   test2    (A, C)   0.40
10  test1    (B, C)   0.20
11  test2    (B, C)   0.30
12  test3    (B, C)   0.25
13  test1    (B, C)   0.65
14  test2    (B, C)   0.12