Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

adding new cloumn to dataframe based on values of columns and values from other dataframe

I have this data frame, df, that has boolean values :

    A  B  C
0   0  1  0
1   0  1  1
2   0  1  1
3   1  0  1
4   0  0  0
5   1  0  0
6   0  0  0
7   0  0  1
8   1  0  0
9   0  0  0
10  1  0  1
11  1  0  1
12  0  1  1
13  1  0  0
14  1  0  0
15  0  1  0
16  1  1  0
17  0  0  1
18  1  0  1
19  1  0  0
20  1  0  1
21  1  1  0
22  1  1  1
23  1  1  1
24  1  0  0
25  1  1  0
26  0  0  1
27  0  1  1
28  0  1  0
29  1  1  0
30  1  0  1
31  0  1  0
32  0  0  1
33  1  1  1
34  0  1  0
35  1  1  0
36  0  1  0
37  0  0  1
38  0  1  1
39  0  1  1

I stored the count of rows as follows :

N = len(df.index) # 40 in this case

Using groupby , I counted each instantiation of df as follows :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    count_series = df.groupby(["A", "B", "C"]).size() #all columns
    new_df = count_series.to_frame(name = 'count').reset_index()
    print(new_df)

The new_df looks like this :

   A  B  C  count
0  0  0  0     3
1  0  0  1     5
2  0  1  0     6
3  0  1  1     6
4  1  0  0     6
5  1  0  1     6
6  1  1  0     5
7  1  1  1     3

Now df row count is N=40 and I want to create a new dataframe ,dfD, that has the same columns as df plus additional column named P(A,B,C) which has the probability of each combination. for example , any row with the values 0,0,0 should have count/N (3/40) which is 0.075
I found these posts but all of them did not help because they are using cases since my df wont just have 3 columns (A,B,C) or just 40 rows. it might be bigger that that
link1 link2
I want something that works with any dataframe of any size

>Solution :

Convert each row into tuple and use groupby

grp = df.apply(tuple, axis=1)
pd.concat([df.groupby(grp).first(),
           grp.groupby(grp).count().div(len(df)).rename("Probs")],
          axis=1).reset_index(drop=True)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading