Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python, Pandas: What it is the best way to separate string into several columns?

I want to split one string into several columns however I encounter issue with str.split method.

Example:
While performing str.split on column Name which consist such strings it work as expected.

data1 = {'Name': ['Alice 23', 'Philip 12', 'Krish 64', 'John 29']}
df1 = pd.DataFrame(data1)
df1
    Name
0   Alice 23
1   Philip 12
2   Krish 64
3   John 29

performing split:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1[['Name', 'age']] = df1['Name'].str.split(' ', 1, expand=True)
df1

Name    age
0   Alice   23
1   Philip  12
2   Krish   64
3   John    29

All good, as I wanted but if I need to place other separator like || its not working properly.

data2 = {'Name': ['Alice||23', 'Philip||12', 'Krish||64', 'John||29']}
df2 = pd.DataFrame(data)
df2

    Name
0   Alice||23
1   Philip||12
2   Krish||64
3   John||29

performing split…

df2[['Name', 'age']] = df2['Name'].str.split('[||]',1,expand = True)
df3[['Name', 'age']] = df2['Name'].str.split('||',1,expand = True)

results in not what I expected

df2
    Name    age
0   Alice   |23
1   Philip  |12
2   Krish   |64
3   John    |29

df3

Name    age
0       Alice||23
1       Philip||12
2       Krish||64
3       John||29

What’s the reason for this behaviour and how get expected result which is as df1?

>Solution :

The issue you’re hitting is that pandas defaults to assuming your string to split on is a regular expression. In regular expressions the "|" character is a special character that enables you to match either expression on the left or right of that character (e.g. you can match ‘a’ or ‘b’ with the expression '(a|b)'.

In your case, we don’t want to pass a regular expression, so you can pass .str.split(…, regex=False).

>>> df2['Name'].str.split('||', regex=False, expand=True)
        0   1
0   Alice  23
1  Philip  12
2   Krish  64
3    John  29

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading