Follow

Follow

Contact

Home stack multiple columns in a pandas dataframe

Questions

stack multiple columns in a pandas dataframe

byMR

December 30, 2023

I have a pandas data frame and would like to stack 4 columns to 2. So I have this

df = pd.DataFrame({'date':['2023-12-01', '2023-12-05', '2023-12-07'],
'other_col':['a', 'b', 'c'],
'right_count':[4,7,9], 'right_sum':[2,3,5],
'left_count':[1,8,5], 'left_sum':[0,8,4]})

    date    other_col  right_count  right_sum   left_count  left_sum
0   2023-12-01  a          4           2           1         0
1   2023-12-05  b          7           3           8         8
2   2023-12-07  c          9           5           5         4

and would like to get this

    date    other_col   side    count   sum
0   2023-12-01  a       right   4        2
1   2023-12-05  b       right   7        3
2   2023-12-07  c       right   9        5
3   2023-12-01  a       left    1        0
4   2023-12-05  b       left    8        8
5   2023-12-07  c       left    5        4

>Solution :

You can use a custom reshaping with a temporary MultiIndex:

out = (df
   .set_index(['date', 'other_col'])
   .pipe(lambda x: x.set_axis(x.columns.str.split('_', expand=True), axis=1))
   .rename_axis(columns=['side', None])
   .stack('side').reset_index()
)

Or a melt+pivot:

tmp = df.melt(['date', 'other_col'], var_name='side')
tmp[['side', 'col']] = tmp['side'].str.split('_', n=1, expand=True)

out = (tmp.pivot(index=['date', 'other_col', 'side'],
                 columns='col', values='value')
          .reset_index().rename_axis(columns=None)
      )

Output:

         date other_col   side  count  sum
0  2023-12-01         a   left      1    0
1  2023-12-01         a  right      4    2
2  2023-12-05         b   left      8    8
3  2023-12-05         b  right      7    3
4  2023-12-07         c   left      5    4
5  2023-12-07         c  right      9    5

Or, much easier, using the janitor library and pivot_longer:

# pip install pyjanitor
import janitor

out = df.pivot_longer(index=['date', 'other_col'],
                      names_to=('side', '.value'),
                      names_pattern=r'([^_]+)_([^_]+)')

Output:

         date other_col   side  count  sum
0  2023-12-01         a  right      4    2
1  2023-12-05         b  right      7    3
2  2023-12-07         c  right      9    5
3  2023-12-01         a   left      1    0
4  2023-12-05         b   left      8    8
5  2023-12-07         c   left      5    4

dataframe

byMR

Published December 30, 2023

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Get temp path with scape characters

byMR

December 30, 2023

Questions

Pandas .loc() method using "not" and "in" operators

byMR

December 30, 2023

Questions

the behavior of a break in an extension for g++ is not the same for gcc version 12.2.0 and gcc version 5.4.0

byMR

December 30, 2023

Questions

Why am I having trouble finding an item?

byMR

December 30, 2023

Questions

Why does (nil . nil) evaluate to (nil) instead of just nil in SBCL?

byMR

December 30, 2023

Questions

Why does 'iw reg get' country 00: DFS-UNSET''

byMR

December 30, 2023