Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract part of a text and split into two columns

I am trying to extract some part of the following sentences (I have similar rows following similar pattern):

Text
19 hours ago — Catch up on key developments an...
8 hour ago — Catch up on key developments an...
10 minutes ago — Catch up on key developments an...
1 day ago — Catch up on key developments an...

I would like to split the Text column into two. (before and after the —) :

Text1          Text 2
19 hours ago   Catch up on key developments an...
8 hour ago     Catch up on key developments an...
10 minutes ago Catch up on key developments an...
1 day ago      Catch up on key developments an...

I did this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df[['Text1', 'Text2']] = df['Text'].str.extract(r"(\d+ \w+, \d{5})?\s*\—?\s*(.*)", expand=True)

However it seems not working.
If you have experience with re, could you please point me to the mistake and to the solution? Thanks

>Solution :

You can use the pandas.Series.str.split function:

df['Text'].str.split(' — ', n=1, expand=True)

You need n=1 to limit the number of splits in output. Also, you need to set expand=True to use the expanding functionality.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading