Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to split text rows into multiple rows given there are many spaces between texts?

I have a dataframe with one column "text":

text
I love cakes    we should make them
Joe is very late            will there be photography?
you should wright code correctly  it is very important

I want to explode those rows in cases where there are 2 or more spaces between texts. So desired output is:

text
I love cakes    
we should make them
Joe is very late            
will there be photography?
you should wright code correctly  
it is very important

I know that I can do: df["text"].apply(lambda x: x.split(" ")) but I don’t want to specify in split each number of spaces (df["text"].apply(lambda x: x.split(" ")), df["text"].apply(lambda x: x.split(" ")), df["text"].apply(lambda x: x.split(" ")), ...... i want 2+ spaces condition. how could I do that?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can split by regex and than explode the column

df = df['text'].str.split(r'\s{2,}').explode().reset_index().drop("index", 1)

Output

                               text
0                      I love cakes
1               we should make them
2                  Joe is very late
3        will there be photography?
4  you should wright code correctly
5              it is very important
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading