Split multiple columns into multiple columns, pandas

May 18, 2022

I have a dataframe

df = pd.DataFrame({'≤8': {1: '3687 55.5', 2: '838 66.5', 3: '8905 66.9'},
 '9–13': {1: '2234 33.6', 2: '419 33.3', 3: '3362 25.2'},
 '14–15': {1: '290 4.4', 2: nan, 3: '473 3.6'},
 '16–17': {1: '194 2.9', 2: nan, 3: '252 1.9'},
 '18–20': {1: '185 2.8', 2: nan, 3: '184 1.4'},
 '≥21': {1: '52 0.8', 2: '0 0.0', 3: '144 1.1'}})

          ≤8       9–13    14–15    16–17    18–20      ≥21
1  3687 55.5  2234 33.6  290 4.4  194 2.9  185 2.8   52 0.8
2   838 66.5   419 33.3      NaN      NaN      NaN    0 0.0
3  8905 66.9  3362 25.2  473 3.6  252 1.9  184 1.4  144 1.1

I want to split all the columns into two columns, so there is an int columns and a float column (note, I don’t care to separate by type, the types are just coincidental). I got as far as splitting the columns, but I can’t figure out how to assign the resulting lists into new columns. I also want to keep it as pythonic/pandonic as possible, so I don’t want to loop over each column individually.

rev = gestation_cols.apply(lambda x: pd.Series([i for i in x.str.split(' ')]))

             ≤8          9–13       14–15       16–17       18–20         ≥21
0  [3687, 55.5]  [2234, 33.6]  [290, 4.4]  [194, 2.9]  [185, 2.8]   [52, 0.8]
1   [838, 66.5]   [419, 33.3]         NaN         NaN         NaN    [0, 0.0]
2  [8905, 66.9]  [3362, 25.2]  [473, 3.6]  [252, 1.9]  [184, 1.4]  [144, 1.1]
3  [1559, 48.6]  [1075, 33.5]  [209, 6.5]  [165, 5.1]  [173, 5.4]   [26, 0.8]

edit: For clarity, I do not want to split a single column or split each column individually. I know I could create new columns one by one, that’s simply bad practice. I want each and every of the columns to be split into two columns.

>Solution :

For a vectorial version you can temporarily stack and use str.split:

df.stack().str.split().unstack()

Output:

             ≤8          9–13       14–15       16–17       18–20         ≥21
1  [3687, 55.5]  [2234, 33.6]  [290, 4.4]  [194, 2.9]  [185, 2.8]   [52, 0.8]
2   [838, 66.5]   [419, 33.3]         NaN         NaN         NaN    [0, 0.0]
3  [8905, 66.9]  [3362, 25.2]  [473, 3.6]  [252, 1.9]  [184, 1.4]  [144, 1.1]

To have multiple columns use the expand=True parameter, you will end up with a MultiIndex that you can then rework as you see fit:

df.stack().str.split(expand=True).unstack()

Output:

      0                                  1                             
     ≤8  9–13 14–15 16–17 18–20  ≥21    ≤8  9–13 14–15 16–17 18–20  ≥21
1  3687  2234   290   194   185   52  55.5  33.6   4.4   2.9   2.8  0.8
2   838   419   NaN   NaN   NaN    0  66.5  33.3   NaN   NaN   NaN  0.0
3  8905  3362   473   252   184  144  66.9  25.2   3.6   1.9   1.4  1.1

reorganizing the order

(df.stack()
   .str.split(expand=True)
   .unstack()
   .swaplevel(axis=1)
   [df.columns]
)

Output:

     ≤8        9–13       14–15      16–17      18–20       ≥21     
      0     1     0     1     0    1     0    1     0    1    0    1
1  3687  55.5  2234  33.6   290  4.4   194  2.9   185  2.8   52  0.8
2   838  66.5   419  33.3   NaN  NaN   NaN  NaN   NaN  NaN    0  0.0
3  8905  66.9  3362  25.2   473  3.6   252  1.9   184  1.4  144  1.1