Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sort pandas dataframe headers with two conditions

i have one question: I have a big dataframe with over 1000 columns.

For example as following the heards of the columns: 2019 Material Cost, 2019 Labor Cost, 2019 Overhead Cost, 2020 Material Cost, 2020 Labor Cost, 2020 Overhead Cost, …2035


df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2019 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2019 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2020 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2020 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2021 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2021 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                  })

I want to sort all headers into the following:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

2019 Material Cost, 2020 Material cost, 2021 Material Cost,…,2019 Labor Cost, 2020 Labor Cost, 2021 Labor Cost, … ,2019 Overhead Cost, 2020 Overhead Cost,2021 Overhead Cost

df = pd.DataFrame({'2019 Material cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Material cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Material cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2019 Overhead cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Overhead cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Overhead cost': [11, 8, 10, 6, 6, 5, 9, 12],
                   '2019 Labor cost': [25, 12, 15, 14, 19, 23, 25, 29],
                   '2020 Labor cost ': [5, 7, 7, 9, 12, 9, 9, 4],
                   '2021 Labor cost': [11, 8, 10, 6, 6, 5, 9, 12],
                  })

So i want to have one cost category and sort the years of the category ascending in a following order, then the next category.

Any help here? Thanks in advance

>Solution :

Create two lists, one with the costs and one with the years. Using these lists you can create another list containing all column names (in order).

costs = list(df.columns.str[5:].unique())
years = list(range(2019, 2036))

columns = [str(year) + ' ' + cost for year in years for cost in costs]
df = df.reindex(columns=columns)

For example:

df = pd.DataFrame(np.random.random((10, 10)), columns = ['1 a', '2 a', '3 a', '4 a', '5 a', '1 b', '2 b', '3 b', '4 b', '5 b'])
costs = ['a', 'b']
years = [1, 2, 3, 4, 5]
columns = [str(year) + ' ' + cost for year in years for cost in costs]
df.reindex(columns=columns).columns

Returns

Index(['1 a', '1 b', '2 a', '2 b', '3 a', '3 b', '4 a', '4 b', '5 a', '5 b'], dtype='object')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading