I want to split one string into several columns however I encounter issue with str.split method.
Example:
While performing str.split on column Name which consist such strings it work as expected.
data1 = {'Name': ['Alice 23', 'Philip 12', 'Krish 64', 'John 29']}
df1 = pd.DataFrame(data1)
df1
Name
0 Alice 23
1 Philip 12
2 Krish 64
3 John 29
performing split:
df1[['Name', 'age']] = df1['Name'].str.split(' ', 1, expand=True)
df1
Name age
0 Alice 23
1 Philip 12
2 Krish 64
3 John 29
All good, as I wanted but if I need to place other separator like || its not working properly.
data2 = {'Name': ['Alice||23', 'Philip||12', 'Krish||64', 'John||29']}
df2 = pd.DataFrame(data)
df2
Name
0 Alice||23
1 Philip||12
2 Krish||64
3 John||29
performing split…
df2[['Name', 'age']] = df2['Name'].str.split('[||]',1,expand = True)
df3[['Name', 'age']] = df2['Name'].str.split('||',1,expand = True)
results in not what I expected
df2
Name age
0 Alice |23
1 Philip |12
2 Krish |64
3 John |29
df3
Name age
0 Alice||23
1 Philip||12
2 Krish||64
3 John||29
What’s the reason for this behaviour and how get expected result which is as df1?
>Solution :
The issue you’re hitting is that pandas defaults to assuming your string to split on is a regular expression. In regular expressions the "|" character is a special character that enables you to match either expression on the left or right of that character (e.g. you can match ‘a’ or ‘b’ with the expression '(a|b)'.
In your case, we don’t want to pass a regular expression, so you can pass .str.split(…, regex=False).
>>> df2['Name'].str.split('||', regex=False, expand=True)
0 1
0 Alice 23
1 Philip 12
2 Krish 64
3 John 29