Split a column into multiple columns with condition

Advertisements

I have a question about splitting columns into multiple rows at Pandas with conditions.
For example, I tend to do something as follows but takes a very long time using for loop

| Index | Value |
| ----- | ----- |
|   0   | 1     |
|   1   | 1,3   |
|   2   | 4,6,8 |
|   3   | 1,3   |
|   4   | 2,7,9 |

into

| Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| ----- | - | - | - | - | - | - | - | - | - |
|   0   | 1 |   |   |   |   |   |   |   |   |
|   1   | 1 |   | 3 |   |   |   |   |   |   |
|   2   |   |   |   | 4 |   | 6 |   | 8 |   |
|   3   | 1 |   | 3 |   |   |   |   |   |   |
|   4   |   | 2 |   |   |   |   | 7 |   | 9 |

I wonder if there are any packages that can help this out rather than to write a for loop to map all indexes.

>Solution :

Assuming the "Value" column contains strings, you can use str.split and pivot like so:

value = df["Value"].str.split(",").explode().astype(int).reset_index()
output = value.pivot(index="index", columns="Value", values="Value")
output = output.reindex(range(value["Value"].min(), value["Value"].max()+1), axis=1)

>>> output

Value    1    2    3    4   5    6    7    8    9
index                                            
0      1.0  NaN  NaN  NaN NaN  NaN  NaN  NaN  NaN
1      1.0  NaN  3.0  NaN NaN  NaN  NaN  NaN  NaN
2      NaN  NaN  NaN  4.0 NaN  6.0  NaN  8.0  NaN
3      1.0  NaN  3.0  NaN NaN  NaN  NaN  NaN  NaN
4      NaN  2.0  NaN  NaN NaN  NaN  7.0  NaN  9.0
Input df:
df = pd.DataFrame({"Value": ["1", "1,3", "4,6,8", "1,3", "2,7,9"]})

Leave a Reply Cancel reply