Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split multiple columns into multiple columns, pandas

I have a dataframe

df = pd.DataFrame({'≤8': {1: '3687 55.5', 2: '838 66.5', 3: '8905 66.9'},
 '9–13': {1: '2234 33.6', 2: '419 33.3', 3: '3362 25.2'},
 '14–15': {1: '290 4.4', 2: nan, 3: '473 3.6'},
 '16–17': {1: '194 2.9', 2: nan, 3: '252 1.9'},
 '18–20': {1: '185 2.8', 2: nan, 3: '184 1.4'},
 '≥21': {1: '52 0.8', 2: '0 0.0', 3: '144 1.1'}})

          ≤8       9–13    14–15    16–17    18–20      ≥21
1  3687 55.5  2234 33.6  290 4.4  194 2.9  185 2.8   52 0.8
2   838 66.5   419 33.3      NaN      NaN      NaN    0 0.0
3  8905 66.9  3362 25.2  473 3.6  252 1.9  184 1.4  144 1.1

I want to split all the columns into two columns, so there is an int columns and a float column (note, I don’t care to separate by type, the types are just coincidental). I got as far as splitting the columns, but I can’t figure out how to assign the resulting lists into new columns. I also want to keep it as pythonic/pandonic as possible, so I don’t want to loop over each column individually.

rev = gestation_cols.apply(lambda x: pd.Series([i for i in x.str.split(' ')]))

             ≤8          9–13       14–15       16–17       18–20         ≥21
0  [3687, 55.5]  [2234, 33.6]  [290, 4.4]  [194, 2.9]  [185, 2.8]   [52, 0.8]
1   [838, 66.5]   [419, 33.3]         NaN         NaN         NaN    [0, 0.0]
2  [8905, 66.9]  [3362, 25.2]  [473, 3.6]  [252, 1.9]  [184, 1.4]  [144, 1.1]
3  [1559, 48.6]  [1075, 33.5]  [209, 6.5]  [165, 5.1]  [173, 5.4]   [26, 0.8]

edit: For clarity, I do not want to split a single column or split each column individually. I know I could create new columns one by one, that’s simply bad practice. I want each and every of the columns to be split into two columns.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

For a vectorial version you can temporarily stack and use str.split:

df.stack().str.split().unstack()

Output:

             ≤8          9–13       14–15       16–17       18–20         ≥21
1  [3687, 55.5]  [2234, 33.6]  [290, 4.4]  [194, 2.9]  [185, 2.8]   [52, 0.8]
2   [838, 66.5]   [419, 33.3]         NaN         NaN         NaN    [0, 0.0]
3  [8905, 66.9]  [3362, 25.2]  [473, 3.6]  [252, 1.9]  [184, 1.4]  [144, 1.1]

To have multiple columns use the expand=True parameter, you will end up with a MultiIndex that you can then rework as you see fit:

df.stack().str.split(expand=True).unstack()

Output:

      0                                  1                             
     ≤8  9–13 14–15 16–17 18–20  ≥21    ≤8  9–13 14–15 16–17 18–20  ≥21
1  3687  2234   290   194   185   52  55.5  33.6   4.4   2.9   2.8  0.8
2   838   419   NaN   NaN   NaN    0  66.5  33.3   NaN   NaN   NaN  0.0
3  8905  3362   473   252   184  144  66.9  25.2   3.6   1.9   1.4  1.1

reorganizing the order

(df.stack()
   .str.split(expand=True)
   .unstack()
   .swaplevel(axis=1)
   [df.columns]
)

Output:

     ≤8        9–13       14–15      16–17      18–20       ≥21     
      0     1     0     1     0    1     0    1     0    1    0    1
1  3687  55.5  2234  33.6   290  4.4   194  2.9   185  2.8   52  0.8
2   838  66.5   419  33.3   NaN  NaN   NaN  NaN   NaN  NaN    0  0.0
3  8905  66.9  3362  25.2   473  3.6   252  1.9   184  1.4  144  1.1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading