Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pairwise cohen's kappa of values in two dataframes

I have two dataframes that look like the toy examples below:

data1 = {'subject': ['A', 'B', 'C', 'D'],
         'group': ['red', 'red', 'blue', 'blue'],
         'lists': [[0, 1, 1], [0, 0, 0], [1, 1, 1], [0, 1, 0]]}

data2 = {'subject': ['a', 'b', 'c', 'd'],
         'group': ['red', 'red', 'blue', 'blue'],
         'lists': [[0, 1, 0], [1, 1, 0], [1, 0, 1], [1, 1, 0]]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

I would like to calculate the cohen’s kappa score for each pair of subjects. For example, I would like to calculate the cohen’s kappa scores for subject "A" in df1 against subjects "a", "b", and "c" in df2… and onwards. Like this:

from sklearn.metrics import cohen_kappa_score
cohen_kappa_score(df1['lists'][0], df2['lists'][0])
cohen_kappa_score(df1['lists'][0], df2['lists'][1])
cohen_kappa_score(df1['lists'][0], df2['lists'][2])
...

Importantly, I would like to represent these pairwise cohen’s kappa scores in a new dataframe where both the columns and rows would be all the subjects ("A", "B", "C", "a", "b", "c"), so that I can see whether these scores are more consist between dataframes or within dataframes. I will eventually convert this dataframe into a heatmap.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This post for a similar R problem looks promising but I don’t know how to implement this in python. Similarly, I have not yet figured out how to implement this python solution, which appears similar enough.

>Solution :

Use pdist:

from scipy.spatial.distance import pdist, squareform
from sklearn.metrics import cohen_kappa_score

s = (pd.concat([df1, df2])
       .set_index('subject')['lists']
       .rename_axis(None)
     )

out = pd.DataFrame(squareform(pdist(np.vstack(s.to_list()), cohen_kappa_score)),
                   index=s.index, columns=s.index)

print(out)

Output:

     A    B    C    D    a    b    c    d
A  0.0  0.0  0.0  0.4  0.4 -0.5 -0.5 -0.5
B  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
C  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
D  0.4  0.0  0.0  0.0  1.0  0.4 -0.8  0.4
a  0.4  0.0  0.0  1.0  0.0  0.4 -0.8  0.4
b -0.5  0.0  0.0  0.4  0.4  0.0 -0.5  1.0
c -0.5  0.0  0.0 -0.8 -0.8 -0.5  0.0 -0.5
d -0.5  0.0  0.0  0.4  0.4  1.0 -0.5  0.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading