How do I create sub-headers in python dataframe?

March 16, 2022

I have a dataframe say:

Example:


import pandas as pd
df = pd.DataFrame({'Item': ['California', '2012%', '2013%','Arizona','2012%','%2019','Janu%ary'], 
                   'col1': [0,50, 50,0,10,11,14],'col2': [0, 50, 40,0,15,13,15]})
Output=
    Item  col1 col2
  1  California  0    0
  2  2012%  50   50
  3  2013%  40   40
  4  Arizona  0   0
  5  2012%.    10. 15
  6. %2019.    11. 13
  7. Janu%ary.  14. 15

I want the column names like " California" and "Arizona" (the ones that do not have "%" in the column values to be considered as Headers that has to be appended to their respective sub-headers. Like maybe iterate down the rows and find a pattern e.g. without ‘%’ in row means its a header, with ‘%’ means its a sub-header
then for the ‘sub-header’ rows, add the last found ‘header’.

 Expected output=
    Item  col1 col2
  
  1  California 2012%  50   50
  2  California 2013%  40   40
  3  Arizona 2012%.    10. 15
  4  Arizona 2019%.    11. 13
  5 Arizona January%.  14. 15

>Solution :

IIUC, you could use a mask and perform boolean masking/indexing:

# does the name contains '%' (you could use other conditions)
m = df['Item'].str.contains('%')
# mask and ffill the "header", then concatenate
df['Item'] = df['Item'].mask(m).ffill() + ' ' + df['Item']

# drop the former header rows
df = df.loc[m]

output:

               Item  col1  col2
1  California 2012%    50    50
2  California 2013%    50    40
4     Arizona 2012%    10    15
5     Arizona 2019%    11    13
6  Arizona January%    14    15

alternative to have a real index:

m = df['Item'].str.contains('%')
df['index'] = df['Item'].mask(m).ffill()

df = df.loc[m].set_index('index')

output:

                Item  col1  col2
index                           
California     2012%    50    50
California     2013%    50    40
Arizona        2012%    10    15
Arizona        2019%    11    13
Arizona     January%    14    15