Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to Store correlation matrix's values in dataframe

I have a df. I want to apply a correlation matrix between the columns and store the result values in another dataframe, however, I do not want to keep the upper part of the matrix, which is useless given that it’s duplicate from the other part, I also don’t want the diagonal (1,1,1,1). So i applied a mask, and then append the result to a dataframe, It

my basic df :

A B C name
0 0.5 0 8 test1
1 5 1 5 test2
2 3 2 6 test3
3 6 3 10 test4
4 18 14 12 test5

And I would like to do the correlation between those parameters and return the result in another dataframe, but only keeping the upper part to not have the duplicate values. The code is :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

dft=pd.DataFrame()
for i in df['name']:
    corrmat = df[df['name']==i].corr()
    corr_triu = corrmat.where(~np.tril(np.ones(corrmat.shape)).astype(np.bool))
    corr_triu = corr_triu.stack()
    corr_triu["name","name"]=i
    dft=dft.append(corr_triu,ignore_index=True)

and that’s the result of my code :

(A,B) (B,C) (C,A) (name,name)
0 0.5 0.2 0.2 test1
1 0.6 0.8 0.3 test2
2 0.7 0.23 0.25 test3
3 0.5 0.9 0.65 test1
4 0.6 0.4 0.12 test2

then I am trying to use .melt in order to restructure the dataframe but I can’t do it :

dft.melt(id_vars=['(name, name)'], 
        var_name="Var1_Var2", 
        value_name="Value")

the error is :

KeyError: "The following ‘id_vars’ are not present in the DataFrame:
[‘(name, name)’]"

but when I looked for the column name got :

Index([('A', 'B'),
       ('A', 'C'),
       ('B', 'C'),
       ('name', 'name')], dtype='object')

It seems weird that the name is not found, I do not get it.

Here is what I would like to get as a result :

Var1_Var2 Value name
0 (A,B) 0.2 test1
1 (A,B) 0.8 test2
2 (A,B) 0.23 test3
3 (B,C) 0.9 test1
4 (B,C) 0.4 test2
5 (B,C) 0.4 test3
6 (C,A) 0.4 test1
7 (C,A) 0.4 test2
8 (C,A) 0.4 test3

There is probably an easier way leading to the result but I can’t find it

>Solution :

Tuples in columns are problematic, here is solution for change tuple for scalar name:

dft.columns = ['name' if x == ('name', 'name') else x for x in dft.columns]

df = dft.melt(id_vars='name', 
          var_name="Var1_Var2", 
          value_name="Value")
print (df)
     name Var1_Var2  Value
0   test1    (A, B)   0.50
1   test2    (A, B)   0.60
2   test3    (A, B)   0.70
3   test1    (A, B)   0.50
4   test2    (A, B)   0.60
5   test1    (A, C)   0.20
6   test2    (A, C)   0.80
7   test3    (A, C)   0.23
8   test1    (A, C)   0.90
9   test2    (A, C)   0.40
10  test1    (B, C)   0.20
11  test2    (B, C)   0.30
12  test3    (B, C)   0.25
13  test1    (B, C)   0.65
14  test2    (B, C)   0.12
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading