Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas Groupby and generate "duplicate" columns for each groupby value

I have a vertical data frame that I am looking to make more horizontal by "duplicating" columns for each item in the groupby column.
I have the following data frame:

    pd.DataFrame({'posteam': {0: 'ARI', 1: 'ARI', 2: 'ARI', 3: 'ARI', 4: 'ARI'},
     'offense_grouping': {0: 'personnel_00',
      1: 'personnel_01',
      2: 'personnel_02',
      3: 'personnel_10',
      4: 'personnel_11'},
     'snap_ct': {0: 1, 1: 6, 2: 4, 3: 396, 4: 1441},
     'personnel_epa': {0: 0.1539720594882965,
      1: 0.7805194854736328,
      2: -0.2678736448287964,
      3: 0.1886662095785141,
      4: 0.005721719935536385}})

And in its current state, there are 5 duplicate values in the ‘posteam’ column and 5 different values in the ‘offense_grouping’ column. Ideally, I would like to group by ‘posteam’ (so the team only has one row) and by ‘offense_grouping’. Each ‘offense_grouping’ value is corresponded with ‘snap_ct’ and ‘personnel_epa’ values. I would like the end result of this group to be something like this:

posteam personnel_00_snap_ct personnel_00_personnel_epa personnel_01_snap_ct personnel_01_personnel_epa personnel_02_snap_ct personnel_02_personnel_epa
ARI 1 .1539… 6 .7805… 4 -.2679

And so on. How can this be achieved?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Given the data you provide, the following would give the expected result. But there might be more complex cases in your data.

z = (
    df
    .set_index(['posteam', 'offense_grouping'])
    .unstack('offense_grouping')
    .swaplevel(axis=1)
    .sort_index(axis=1, ascending=[True, False])
)

# or, alternatively (might be better if you have multiple values
# for some given indices./columns):

z = (
    df
    .pivot_table(index='posteam', columns='offense_grouping', values=['snap_ct', 'personnel_epa'])
    .swaplevel(axis=1)
    .sort_index(axis=1, ascending=[True, False])
)
>>> z
offense_grouping personnel_00               personnel_01                \
                      snap_ct personnel_epa      snap_ct personnel_epa   
posteam                                                                  
ARI                         1      0.153972            6      0.780519   

offense_grouping personnel_02               personnel_10                \
                      snap_ct personnel_epa      snap_ct personnel_epa   
posteam                                                                  
ARI                         4     -0.267874          396      0.188666   

offense_grouping personnel_11                
                      snap_ct personnel_epa  
posteam                                      
ARI                      1441      0.005722 

Then you can join the two levels of columns:

res = z.set_axis([f'{b}_{a}' for a, b in z.columns], axis=1)
>>> res
         snap_ct_personnel_00  personnel_epa_personnel_00  snap_ct_personnel_01  personnel_epa_personnel_01  snap_ct_personnel_02  personnel_epa_personnel_02  snap_ct_personnel_10  personnel_epa_personnel_10  snap_ct_personnel_11  personnel_epa_personnel_11
posteam                                                                                                                                                                                                                                                          
ARI      1                     0.153972                    6                     0.780519                    4                    -0.267874                    396                   0.188666                    1441                  0.005722                  
​```
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading