Home How transform list of strings in column and split dataframe by same string to have several?

Questions

How transform list of strings in column and split dataframe by same string to have several?

March 24, 2022

I have a dataframe with a column containing list of strings.

id sentence                                            category
0  "I love basketball and dunk to the basket"          ['basketball']
1  "I am playing football and basketball tomorrow "    ['football', 'basketball']

I would like to do 2 things:

1. Transform category column where every elements from previous list become a string and have one row for each string and with same id and sentence
1. Have one dataframe by category

Expected output for step 1):

id sentence                                            category
0  "I love basketball and dunk to the basket"          'basketball'
1  "I am playing football and tomorrow basketball"     'football'
1  "I am playing football and tomorrow basketball"     'basketball'

Expected output for step 2):

DF_1

id sentence                                            category
0  "I love basketball and dunk to the basket"          'basketball'
1  "I am playing football and tomorrow basketball"     'basketball'

DF_2

id sentence                                            category
1  "I am playing football and tomorrow basketball"     'football'

How can I do this ? For each and examine len of each list can work, but is there a more faster/elegant way ?

>Solution :

You could explode "category"; then groupby:

out = [g for _, g in df.explode('category').groupby('category')]

Then if you print the items in out:

for i in out:
    print(i, end='\n\n')

you’ll see:

   id                                        sentence    category
0   0        I love basketball and dunk to the basket  basketball
1   1  I am playing football and basketball tomorrow   basketball

   id                                        sentence  category
1   1  I am playing football and basketball tomorrow   football