Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Reform pandas dataframe

I have a dataframe:

df1 = pandas.DataFrame( { 
    "text" : ["Alice is in ", "Alice is in wonderland.", "Mallory has done the task.", "Mallory has", "Bob is final." , "Mallory has done"] , 
    "label" : ["Seattle", "Portlang", "Gotland", "california", "california", "Portland"] ,
    "title":["SA","SA","sometitle","sometitle","some different title","sometitle"],
    "version":[1,2,4,1,2,3]})

df1
              text                  label           title            version
    0   Alice is in                Seattle         SA                   1 
    1   Alice is in wonderland.    Portlang        SA                   2
    2   Mallory has done the task. Portland       sometitle             4
    3   Mallory has                california     sometitle             1
    4   Bob is final.              california     some different title  2
    5   Mallory has done            Portland       sometitle            3

I want to keep the title and text corresponding to the latest version number, also want to keep the labels in a list.
So the final result I expect is following:

    title            version    label                          text
0   SA                   2  [Seattle, Portlang]              Alice is in wonderland.
1   some different title 2  [california]                     Bob is final.
2   sometitle            4  [Gotland, california, Portland]  Mallory has done the task.

I could perform so far this, bit confuse how to handle ‘text’ here:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df2 = df1.groupby(['title']).agg({'version':'max', 'label':list})[['version','label']].reset_index()
df2

title                    version    label
0   SA                      2   [Seattle, Portlang]
1   some different title    2   [california]
2   sometitle               4   [Gotland, california, Portland]

any help is appreciated.

Thank you,

>Solution :

Use df.merge with Groupby.agg:

In [508]: x = df1.groupby(['title']).agg({'version':'max', 'label':list})

In [516]: df1[['title', 'version', 'text']].merge(x, on=['title', 'version'])
Out[516]: 
                  title  version                        text                            label
0                    SA        2     Alice is in wonderland.              [Seattle, Portlang]
1             sometitle        4  Mallory has done the task.  [Gotland, california, Portland]
2  some different title        2               Bob is final.                     [california]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading