I have a dataframe with a column containing list of strings.
id sentence category
0 "I love basketball and dunk to the basket" ['basketball']
1 "I am playing football and basketball tomorrow " ['football', 'basketball']
I would like to do 2 things:
-
- Transform category column where every elements from previous list become a string and have one row for each string and with same id and sentence
-
- Have one dataframe by category
Expected output for step 1):
id sentence category
0 "I love basketball and dunk to the basket" 'basketball'
1 "I am playing football and tomorrow basketball" 'football'
1 "I am playing football and tomorrow basketball" 'basketball'
Expected output for step 2):
DF_1
id sentence category
0 "I love basketball and dunk to the basket" 'basketball'
1 "I am playing football and tomorrow basketball" 'basketball'
DF_2
id sentence category
1 "I am playing football and tomorrow basketball" 'football'
How can I do this ? For each and examine len of each list can work, but is there a more faster/elegant way ?
>Solution :
You could explode "category"; then groupby:
out = [g for _, g in df.explode('category').groupby('category')]
Then if you print the items in out:
for i in out:
print(i, end='\n\n')
you’ll see:
id sentence category
0 0 I love basketball and dunk to the basket basketball
1 1 I am playing football and basketball tomorrow basketball
id sentence category
1 1 I am playing football and basketball tomorrow football