Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python: Venn diagram from score data

I have the following data:

df =
id testA testB
1  3     NA
1  1     3
2  2     NA
2  NA    1
2  0     0
3  NA    NA
3  1     1

I would like to create a Venn diagram of the number of times that testA and testB appear, testA but not testB, and testB but not testA.

The expected outcome would be the following groups:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

Both tests: 3
A but not B: 2
B but not A: 1

>Solution :

While I am not sure how you get to your index in the dataframe, or if you have another index. Also, I assumed NA to be np.nan.

In any case, you can try something like the following (but start where your df exists). First, I try to recreate your DataFrame. Then, i create two sets, namely setA and setB, which contain the indices of where the data is not nan. Finally, a Venn diagram is created, containing these two sets.

from matplotlib_venn import venn2
import pandas
import numpy as np

df = pandas.DataFrame()
df["testA"] = [3,1,2,np.nan,0,np.nan,1]
df["testB"] = [np.nan,3,np.nan,1,0,np.nan,1]

setA = set([index_ for index_ in df.index if not np.isnan(df["testA"].loc[index_])])
setB = set([index_ for index_ in df.index if not np.isnan(df["testB"].loc[index_])])
venn2([setA, setB])

You then get something like this.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading