I have a dataframe
df = pd.DataFrame({'≤8': {1: '3687 55.5', 2: '838 66.5', 3: '8905 66.9'},
'9–13': {1: '2234 33.6', 2: '419 33.3', 3: '3362 25.2'},
'14–15': {1: '290 4.4', 2: nan, 3: '473 3.6'},
'16–17': {1: '194 2.9', 2: nan, 3: '252 1.9'},
'18–20': {1: '185 2.8', 2: nan, 3: '184 1.4'},
'≥21': {1: '52 0.8', 2: '0 0.0', 3: '144 1.1'}})
≤8 9–13 14–15 16–17 18–20 ≥21
1 3687 55.5 2234 33.6 290 4.4 194 2.9 185 2.8 52 0.8
2 838 66.5 419 33.3 NaN NaN NaN 0 0.0
3 8905 66.9 3362 25.2 473 3.6 252 1.9 184 1.4 144 1.1
I want to split all the columns into two columns, so there is an int columns and a float column (note, I don’t care to separate by type, the types are just coincidental). I got as far as splitting the columns, but I can’t figure out how to assign the resulting lists into new columns. I also want to keep it as pythonic/pandonic as possible, so I don’t want to loop over each column individually.
rev = gestation_cols.apply(lambda x: pd.Series([i for i in x.str.split(' ')]))
≤8 9–13 14–15 16–17 18–20 ≥21
0 [3687, 55.5] [2234, 33.6] [290, 4.4] [194, 2.9] [185, 2.8] [52, 0.8]
1 [838, 66.5] [419, 33.3] NaN NaN NaN [0, 0.0]
2 [8905, 66.9] [3362, 25.2] [473, 3.6] [252, 1.9] [184, 1.4] [144, 1.1]
3 [1559, 48.6] [1075, 33.5] [209, 6.5] [165, 5.1] [173, 5.4] [26, 0.8]
edit: For clarity, I do not want to split a single column or split each column individually. I know I could create new columns one by one, that’s simply bad practice. I want each and every of the columns to be split into two columns.
>Solution :
For a vectorial version you can temporarily stack
and use str.split
:
df.stack().str.split().unstack()
Output:
≤8 9–13 14–15 16–17 18–20 ≥21
1 [3687, 55.5] [2234, 33.6] [290, 4.4] [194, 2.9] [185, 2.8] [52, 0.8]
2 [838, 66.5] [419, 33.3] NaN NaN NaN [0, 0.0]
3 [8905, 66.9] [3362, 25.2] [473, 3.6] [252, 1.9] [184, 1.4] [144, 1.1]
To have multiple columns use the expand=True
parameter, you will end up with a MultiIndex that you can then rework as you see fit:
df.stack().str.split(expand=True).unstack()
Output:
0 1
≤8 9–13 14–15 16–17 18–20 ≥21 ≤8 9–13 14–15 16–17 18–20 ≥21
1 3687 2234 290 194 185 52 55.5 33.6 4.4 2.9 2.8 0.8
2 838 419 NaN NaN NaN 0 66.5 33.3 NaN NaN NaN 0.0
3 8905 3362 473 252 184 144 66.9 25.2 3.6 1.9 1.4 1.1
reorganizing the order
(df.stack()
.str.split(expand=True)
.unstack()
.swaplevel(axis=1)
[df.columns]
)
Output:
≤8 9–13 14–15 16–17 18–20 ≥21
0 1 0 1 0 1 0 1 0 1 0 1
1 3687 55.5 2234 33.6 290 4.4 194 2.9 185 2.8 52 0.8
2 838 66.5 419 33.3 NaN NaN NaN NaN NaN NaN 0 0.0
3 8905 66.9 3362 25.2 473 3.6 252 1.9 184 1.4 144 1.1