I have a dataframe:
df1 = pandas.DataFrame( {
"text" : ["Alice is in ", "Alice is in wonderland.", "Mallory has done the task.", "Mallory has", "Bob is final." , "Mallory has done"] ,
"label" : ["Seattle", "Portlang", "Gotland", "california", "california", "Portland"] ,
"title":["SA","SA","sometitle","sometitle","some different title","sometitle"],
"version":[1,2,4,1,2,3]})
df1
text label title version
0 Alice is in Seattle SA 1
1 Alice is in wonderland. Portlang SA 2
2 Mallory has done the task. Portland sometitle 4
3 Mallory has california sometitle 1
4 Bob is final. california some different title 2
5 Mallory has done Portland sometitle 3
I want to keep the title and text corresponding to the latest version number, also want to keep the labels in a list.
So the final result I expect is following:
title version label text
0 SA 2 [Seattle, Portlang] Alice is in wonderland.
1 some different title 2 [california] Bob is final.
2 sometitle 4 [Gotland, california, Portland] Mallory has done the task.
I could perform so far this, bit confuse how to handle ‘text’ here:
df2 = df1.groupby(['title']).agg({'version':'max', 'label':list})[['version','label']].reset_index()
df2
title version label
0 SA 2 [Seattle, Portlang]
1 some different title 2 [california]
2 sometitle 4 [Gotland, california, Portland]
any help is appreciated.
Thank you,
>Solution :
Use df.merge with Groupby.agg:
In [508]: x = df1.groupby(['title']).agg({'version':'max', 'label':list})
In [516]: df1[['title', 'version', 'text']].merge(x, on=['title', 'version'])
Out[516]:
title version text label
0 SA 2 Alice is in wonderland. [Seattle, Portlang]
1 sometitle 4 Mallory has done the task. [Gotland, california, Portland]
2 some different title 2 Bob is final. [california]