Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Making a column out of a part of an multiindex column in pandas

I have a df:

df_test = pd.DataFrame.from_dict({('group', ''): {0: 'A',
  1: 'A',
  2: 'A',
  3: 'A',
  4: 'A',
  5: 'A',
  6: 'A',
  7: 'A',
  8: 'A',
  9: 'B',
  10: 'B',
  11: 'B',
  12: 'B',
  13: 'B',
  14: 'B',
  15: 'B',
  16: 'B',
  17: 'B',
  18: 'all',
  19: 'all'},
 ('category', ''): {0: 'Amazon',
  1: 'Apple',
  2: 'Facebook',
  3: 'Google',
  4: 'Netflix',
  5: 'Tesla',
  6: 'Total',
  7: 'Uber',
  8: 'total',
  9: 'Amazon',
  10: 'Apple',
  11: 'Facebook',
  12: 'Google',
  13: 'Netflix',
  14: 'Tesla',
  15: 'Total',
  16: 'Uber',
  17: 'total',
  18: 'Total',
  19: 'total'},
 (pd.Timestamp('2020-06-29'), 'last_sales'): {0: 195.0,
  1: 61.0,
  2: 106.0,
  3: 61.0,
  4: 37.0,
  5: 13.0,
  6: 954.0,
  7: 4.0,
  8: 477.0,
  9: 50.0,
  10: 50.0,
  11: 75.0,
  12: 43.0,
  13: 17.0,
  14: 14.0,
  15: 504.0,
  16: 3.0,
  17: 252.0,
  18: 2916.0,
  19: 2916.0},
 (pd.Timestamp('2020-06-29'), 'sales'): {0: 1268.85,
  1: 18274.385000000002,
  2: 19722.65,
  3: 55547.255,
  4: 15323.800000000001,
  5: 1688.6749999999997,
  6: 227463.23,
  7: 1906.0,
  8: 113731.615,
  9: 3219.6499999999996,
  10: 15852.060000000001,
  11: 17743.7,
  12: 37795.15,
  13: 5918.5,
  14: 1708.75,
  15: 166349.64,
  16: 937.01,
  17: 83174.82,
  18: 787625.7400000001,
  19: 787625.7400000001},
 (pd.Timestamp('2020-06-29'), 'difference'): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  9: 0.0,
  10: 0.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 0.0,
  16: 0.0,
  17: 0.0,
  18: 0.0,
  19: 0.0},
 (pd.Timestamp('2020-07-06'), 'last_sales'): {0: 26.0,
  1: 39.0,
  2: 79.0,
  3: 49.0,
  4: 10.0,
  5: 10.0,
  6: 436.0,
  7: 5.0,
  8: 218.0,
  9: 89.0,
  10: 34.0,
  11: 133.0,
  12: 66.0,
  13: 21.0,
  14: 20.0,
  15: 732.0,
  16: 3.0,
  17: 366.0,
  18: 2336.0,
  19: 2336.0},
 (pd.Timestamp('2020-07-06'), 'sales'): {0: 3978.15,
  1: 12138.96,
  2: 19084.175,
  3: 40033.46000000001,
  4: 4280.15,
  5: 1495.1,
  6: 165548.29,
  7: 1764.15,
  8: 82774.145,
  9: 8314.92,
  10: 12776.649999999996,
  11: 28048.075,
  12: 55104.21000000002,
  13: 6962.844999999999,
  14: 3053.2000000000003,
  15: 231049.11000000002,
  16: 1264.655,
  17: 115524.55500000001,
  18: 793194.8000000002,
  19: 793194.8000000002},
 (pd.Timestamp('2020-07-06'), 'difference'): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  9: 0.0,
  10: 0.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 0.0,
  16: 0.0,
  17: 0.0,
  18: 0.0,
  19: 0.0},
 (pd.Timestamp('2021-06-28'), 'last_sales'): {0: 96.0,
  1: 56.0,
  2: 106.0,
  3: 44.0,
  4: 34.0,
  5: 13.0,
  6: 716.0,
  7: 9.0,
  8: 358.0,
  9: 101.0,
  10: 22.0,
  11: 120.0,
  12: 40.0,
  13: 13.0,
  14: 8.0,
  15: 610.0,
  16: 1.0,
  17: 305.0,
  18: 2652.0,
  19: 2652.0},
 (pd.Timestamp('2021-06-28'), 'sales'): {0: 5194.95,
  1: 19102.219999999994,
  2: 22796.420000000002,
  3: 30853.115,
  4: 11461.25,
  5: 992.6,
  6: 188143.41,
  7: 3671.15,
  8: 94071.705,
  9: 6022.299999999998,
  10: 7373.6,
  11: 33514.0,
  12: 35943.45,
  13: 4749.000000000001,
  14: 902.01,
  15: 177707.32,
  16: 349.3,
  17: 88853.66,
  18: 731701.46,
  19: 731701.46},
 (pd.Timestamp('2021-06-28'), 'difference'): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  9: 0.0,
  10: 0.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 0.0,
  16: 0.0,
  17: 0.0,
  18: 0.0,
  19: 0.0},
 (pd.Timestamp('2021-07-07'), 'last_sales'): {0: 45.0,
  1: 47.0,
  2: 87.0,
  3: 45.0,
  4: 13.0,
  5: 8.0,
  6: 494.0,
  7: 2.0,
  8: 247.0,
  9: 81.0,
  10: 36.0,
  11: 143.0,
  12: 56.0,
  13: 9.0,
  14: 9.0,
  15: 670.0,
  16: 1.0,
  17: 335.0,
  18: 2328.0,
  19: 2328.0},
 (pd.Timestamp('2021-07-07'), 'sales'): {0: 7556.414999999998,
  1: 14985.05,
  2: 16790.899999999998,
  3: 36202.729999999996,
  4: 4024.97,
  5: 1034.45,
  6: 163960.32999999996,
  7: 1385.65,
  8: 81980.16499999998,
  9: 5600.544999999999,
  10: 11209.92,
  11: 32832.61,
  12: 42137.44500000001,
  13: 3885.1499999999996,
  14: 1191.5,
  15: 194912.34000000003,
  16: 599.0,
  17: 97456.17000000001,
  18: 717745.3400000001,
  19: 717745.3400000001},
 (pd.Timestamp('2021-07-07'), 'difference'): {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  9: 0.0,
  10: 0.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 0.0,
  16: 0.0,
  17: 0.0,
  18: 0.0,
  19: 0.0}}).set_index(['group','category'])

Which follows this structure:

                    2020-06-29                                       2020-07-05

group   category    last_sales      sales       difference
A       Amazon      195             1268        0    
        Apple       61              18247       0
        Facebook    106             19722       0
        ...

B       Amazon      50              3219        0
        Apple       50              15852       0
        Facebook    75              17743       0
        ... 

I am trying to move sales, last_sales, difference next to category. So at the moment those are columns in each week, I am trying to make them some kind of index where the df would look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

                                        2020-06-29            2020-07-05

group   category    
A       Amazon     last_sales           195                   ...  
                   sales                1268                  ...  
                   difference           0                     ...
        Apple      last_sales           61                    ...
                   sales                18247                 ...
                   difference           0                     ...
        Facebook   last_sales           106                   ...
                   sales                19722                 ...
                   difference           0                     ...
        ...

B       Amazon     last_sales           50                    ...
                   sales                3219                  ...
                   difference           0                     ...
        Apple      last_sales           50                    ...
                   sales                15852                 ...
                   difference           0                     ...
        Facebook   last_sales           75                    ...
                   sales                17743                 ...
                   difference           0                     ...
        ... 

I’ve been trying to achieve this using .unstack(), .reset_index() but with no success, I am not sure how I should proceed with this problem.
The problem I am unable to solves comes from the fact that :

df_test.columns

Returns:

MultiIndex([(2020-06-29 00:00:00, 'last_sales'),
            (2020-06-29 00:00:00,      'sales'),
            (2020-06-29 00:00:00, 'difference'),
            (2020-07-06 00:00:00, 'last_sales'),
            (2020-07-06 00:00:00,      'sales'),
            (2020-07-06 00:00:00, 'difference'),
            (2021-06-28 00:00:00, 'last_sales'),
            (2021-06-28 00:00:00,      'sales'),
            (2021-06-28 00:00:00, 'difference'),
            (2021-07-07 00:00:00, 'last_sales'),
            (2021-07-07 00:00:00,      'sales'),
            (2021-07-07 00:00:00, 'difference')],
           )

So the one part, the date should stay as a column and the other part, the sales, last_sales and difference should become one new column ( or index, I am not sure, but I hope my desired output explains it).

>Solution :

If original ordering is not important, use DataFrame.stack:

df = df_test.stack()
print (df.head())
                           2020-06-29 00:00:00  2020-07-06 00:00:00  \
group category                                                        
A     Amazon   difference                 0.00                 0.00   
               last_sales               195.00                26.00   
               sales                   1268.85              3978.15   
      Apple    difference                 0.00                 0.00   
               last_sales                61.00                39.00   

                           2021-06-28 00:00:00  2021-07-07 00:00:00  
group category                                                       
A     Amazon   difference                 0.00                0.000  
               last_sales                96.00               45.000  
               sales                   5194.95             7556.415  
      Apple    difference                 0.00                0.000  
               last_sales                56.00               47.000  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading