Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split pandas dataframe column based on the pipeline symbol

I have a pandas data frame which has a single column named Category. I want to split this Category column into 4 separate columns named A, B, C, D based on the pipeline symbol "||"

Sample input: df[‘Category’] = Operations||Modification||Bank||Bank Process

Sample output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df[‘A’] = Operations

df[‘B’] = Modification

df[‘C’] = Bank

df[‘D’] = Bank Process

I have looked up many answers on stack overflow but none are working for me. I have tried the following code:

df[['A', 'B', 'C', 'D']] = df['Category'].str.split("||", expand = True)

But it gives the error: Exception has occurred: ValueError
Columns must be same length as key

>Solution :

Presumably your version of Pandas is running str.split with regex mode enabled. In that case, you would need to escape the pipes:

df[["A", "B", "C", "D"]] = df["Category"].str.split(r'\|\|', expand=True)

Or, you also could explicitly turn off regex mode:

df[["A", "B", "C", "D"]] = df["Category"].str.split("||", expand=True, regex=False)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading