Home How to replace a row in pandas with multiple rows after applying a function?

Questions

How to replace a row in pandas with multiple rows after applying a function?

February 15, 2022

I have a pandas dataframe that contains only one column which contains a string. I want to apply a function to each row that will split the string by sentence and replace that row with rows generated from the function.

Example dataframe:

import pandas as pd
df = pd.DataFrame(["A sentence. Another sentence. More sentences here.", "Another line of text"])

Output of df.head():

                                                   0
0  A sentence. Another sentence. More sentences h...
1                               Another line of text

I have tried using apply() method as follows:

def get_sentence(row):
    return pd.DataFrame(re.split('\.', row[0]))
df.apply(get_sentence, axis=1)

But then df.head() gives:

0                          0
0            A sentenc...
1                            0
0  Another line of text

I want the output as:

                     0
0            A sentence
1      Another sentence
2   More sentences here
3  Another line of text

What is the correct way to do this?

>Solution :

You can use

df[0].str.split(r'\.(?!$)').explode().reset_index(drop=True).str.rstrip('.')

Output:

0               A sentence
1         Another sentence
2     More sentences here
3     Another line of text

The \.(?!$) regex matches a dot not at the end of the string. The .explode() splits the results across rows and the .reset_index(drop=True) resets the indices. .str.rstrip('.') will remove trailing dots.

You can also use Series.str.findall version:

>>> df[0].str.findall(r'[^.]+').explode().reset_index(drop=True)
0              A sentence
1        Another sentence
2     More sentences here
3    Another line of text

where [^.]+ matches any one or more chars other than . char.