Home How to aggregate 3 columns in DataFrame to have count and distribution of values in separated columns in Python Pandas?

Questions

How to aggregate 3 columns in DataFrame to have count and distribution of values in separated columns in Python Pandas?

byMR

December 12, 2022

I have Pandas DataFrame like below:

data types:

ID – int

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel
TIME – int
TG – int

ID TIME TG

111 20210101 0

111 20210201 0

111 20210301 1

222 20210101 0

222 20210201 1

333 20210201 1

ID	TIME	TG
111	20210101	0
111	20210201	0
111	20210301	1
222	20210101	0
222	20210201	1
333	20210201	1

And I need to aggregate above DataFrame so as to know:

how many IDs are per each value in TIME
how many "1" from TG are per each value in TIME
how many "0" from TG are per each value in TIME

So I need to something like below:

TIME     | num_ID | num_1 | num_0
---------|--------|-------|--------
20210101 | 2      | 0     | 2
20210201 | 3      | 2     | 1
20210301 | 1      | 1     | 0

How can I do that in Python Padas ?

>Solution :

Use GroupBy.size for counts TIME values with crosstab for count number of 0 and 1 values:

df1 = (df.groupby('TIME').size().to_frame('num_ID')
         .join(pd.crosstab(df['TIME'], df['TG']).add_prefix('num_'))
         .reset_index())
print (df1)
       TIME  num_ID  num_0  num_1
0  20210101       2      2      0
1  20210201       3      1      2
2  20210301       1      0      1

Another idea if need count only 0 and 1 values in GroupBy.agg:

df1 = (df.assign(num_0 = df['TG'].eq(0),
                num_1 = df['TG'].eq(1))
        .groupby('TIME').agg(num_ID = ('TG','size'),
                             num_1=('num_1','sum'),
                             num_0=('num_0','sum'),
                             )
        .reset_index()
        )
print (df1)
       TIME  num_ID  num_1  num_0
0  20210101       2      0      2
1  20210201       3      2      1
2  20210301       1      1      0