Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I create sub-headers in python dataframe?

I have a dataframe say:

Example:


import pandas as pd
df = pd.DataFrame({'Item': ['California', '2012%', '2013%','Arizona','2012%','%2019','Janu%ary'], 
                   'col1': [0,50, 50,0,10,11,14],'col2': [0, 50, 40,0,15,13,15]})
Output=
    Item  col1 col2
  1  California  0    0
  2  2012%  50   50
  3  2013%  40   40
  4  Arizona  0   0
  5  2012%.    10. 15
  6. %2019.    11. 13
  7. Janu%ary.  14. 15

I want the column names like " California" and "Arizona" (the ones that do not have "%" in the column values to be considered as Headers that has to be appended to their respective sub-headers. Like maybe iterate down the rows and find a pattern e.g. without ‘%’ in row means its a header, with ‘%’ means its a sub-header
then for the ‘sub-header’ rows, add the last found ‘header’.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 Expected output=
    Item  col1 col2
  
  1  California 2012%  50   50
  2  California 2013%  40   40
  3  Arizona 2012%.    10. 15
  4  Arizona 2019%.    11. 13
  5 Arizona January%.  14. 15

>Solution :

IIUC, you could use a mask and perform boolean masking/indexing:

# does the name contains '%' (you could use other conditions)
m = df['Item'].str.contains('%')
# mask and ffill the "header", then concatenate
df['Item'] = df['Item'].mask(m).ffill() + ' ' + df['Item']

# drop the former header rows
df = df.loc[m]

output:

               Item  col1  col2
1  California 2012%    50    50
2  California 2013%    50    40
4     Arizona 2012%    10    15
5     Arizona 2019%    11    13
6  Arizona January%    14    15
alternative to have a real index:
m = df['Item'].str.contains('%')
df['index'] = df['Item'].mask(m).ffill()

df = df.loc[m].set_index('index')

output:

                Item  col1  col2
index                           
California     2012%    50    50
California     2013%    50    40
Arizona        2012%    10    15
Arizona        2019%    11    13
Arizona     January%    14    15
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading