Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to split a string column into two column by varying space delimiter on its last occurence

I am trying a way to split a string column in python to two different columns by space delimiter. I have tried with below code:

df[['A', 'B']] = df['AB'].str.split(' ', 1, expand=True)

But this will work only if the space delimiter is having only single space. I would like to know if we can split a column by varying length of space delimiter by its last occurence.

Example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If column value is "aa bb cc", then the resultant new columns values should be "aa bb" and "cc"

If column value is "dd ee (more than one space delimiter) ff", then resultant new columns values should be "dd ee" and "ff"

Here we need to split the string column by delimiter on on its last occurence but it can have varying length for space.

Any help will be much appreciated.

>Solution :

You can use this regex to split on:

\s+(?!.*\s)

This looks for a sequence of spaces which has no spaces after it in the string, so will only split into two values at most.

Usage:

df = pd.DataFrame({'AB': ['aa bb cc', 'dd ee    ff']})
print(df)
df[['A', 'B']] = df['AB'].str.split(r'\s+(?!.*\s)', expand=True)
print(df)

Output:

            AB
0     aa bb cc
1  dd ee    ff

            AB      A   B
0     aa bb cc  aa bb  cc
1  dd ee    ff  dd ee  ff
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading