IOB format merge

I have a dataframe in IOB format as below:-

Name Label
Alan B-PERSON
Smith I-PERSON
is O
Alice’s B-PERSON
uncle O
from O
New B-LOCATION
York I-LOCATION
city I-LOCATION

I would like to convert into a new dataframe as below:-

Name Label
Alan Smith PERSON
Alice’s PERSON
New York city LOCATION

Any help is much appreciated!

>Solution :

You can create groups by compare values O, remove IO- values in Label column and with helper groups created by cumulative sum aggregate join:

m = df['Label'].eq('O')

df = (df[~m].assign(Label=lambda x: x['Label'].str.replace('^[IB]-', ''))
            .groupby([m.cumsum(), 'Label'])['Name']
            .agg(' '.join)
            .droplevel(0)
            .reset_index()
            .reindex(df.columns, axis=1))
print (df)
            Name     Label
0     Alan Smith    PERSON
1        Alice's    PERSON
2  New York city  LOCATION

Leave a Reply