I have a df
review review_id word_count
['abc def', 'ghi'] [1, 2] [2, 1]
['jkl mno pqr', 'stu', 'vwx'] [3, 4, 5] [3, 1, 1]
For each row in the df, I would like to create new columns for review and review_id whereby there will only consist of the review/review_id with the maximum word count.
Thus, the output should look something like this.
review review_id
abc def 1
jkl mno pqr 3
I can only think of getting the index of the max value in word_count. Any sugeestions will be welcomed. Thanks!
>Solution :
Use DataFrame.explode with compare maximal values per groups (from rows) and select rows by boolean indexing if need all maximal values:
df = df.explode(['review','review_id','word_count'])
#for oldier pandas versions
#df = df.apply(lambda x: x.explode())
df = df[df['word_count'].eq(df.groupby(level=0)['word_count'].transform('max'))]
print (df)
review review_id word_count
0 abc def 1 2
1 jkl mno pqr 3 3