Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use duplicated rows in pandas DF to efficiently create new columns?

I am trying to convert the output of the value_counts() pd method into a new df structure.

pd.DataFrame({'Value': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3},
 'Class': {0: 14, 1: 7, 2: 15, 3: 1, 4: 12, 5: 2, 6: 15},
 'counts': {0: 2065, 1: 1268, 2: 78, 3: 54, 4: 40, 5: 33, 6: 6}})
    Value   Class   counts
0   1       14       2065
1   1       7        1268
2   1       15       78
3   2       1        54
4   2       12       40
5   2       2        33
6   3       15       6

I am trying to consolidate to the unique ids from the Value field (1, 2, 3), and then have the unique numbers from Class field for the new fields. The the class values will be the counts.

I am trying to get something like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     Value    1    2    7    12    14    15 
0    1        0    0    1268 0     2065  78
1    2        54   33   0    40    0     0 
2    3        0    0    0    0     0     6

I know I can brute-force it, but I am working with 10+ million unique "Value" ids, so 50+ million count rows. I am trying to figure out a fast and effective way to do this.

>Solution :

This can be done with a pivot table:

df.pivot(index='Value', columns='Class', values='counts').fillna(0)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading