Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Groupby multiple columns in pandas dataframe

I have a dataframe looks like:

page    reference       ids                 -         subject           word
1       apple           ['aaaa', 'bbbbb', 'cccc']       name            app
1       apple           ['bndv', 'asasa', 'swdsd']      fruit           is
1       apple           ['bsnm', 'dfsd', 'dgdf']        fruit           text
1       bat             ['asas', 'ddfgd', 'ff']         thing           sport
1       cat             ['sds', 'dffd', 'gdg']          fruit           color
1       bat             ['sds', 'fsss', 'ssfd']         thing           was
1       bat             ['fsf', 'sff', 'fss']           place           that
2       dog             ['fffds', 'gd', 'sdg']          name            mud
2       egg             ['dfff', 'sdf', 'vcv']          place           gun
2       dog             ['dsfd', 'fds', 'gfdg']         thing           kit
2       egg             ['ddd', 'fg', 'dfg']            place           hut

I want to groupby reference column and subject column. The output should look like this:

output:
page    reference   ids                                                subject          word
1       apple   [['bndv', 'asasa', 'swdsd'],['bsnm', 'dfsd', 'dgdf']]   fruit           [[is], [text]]
1       apple   ['aaaa', 'bbbbb', 'cccc']                               name            [app]
1       bat     [['asas', 'ddfgd', 'ff'], [['sds', 'fsss', 'ssfd']]     thing           [[sport], [was]]
1       bat     ['fsf', 'sff', 'fss']                                   place           [that]
1       cat     ['sds', 'dffd', 'gdg']                                  fruit           [color]
2       dog     ['fffds', 'gd', 'sdg']                                  name            [mud]
2       dog     ['dsfd', 'fds', 'gfdg']                                 thing           [kit]
2       egg     [['dfff', 'sdf', 'vcv'], ['ddd', 'fg', 'dfg']]          place           [[gun], [hut]]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

First to group and aggregate necessary fields:

res = df.groupby(["reference", "subject"]).agg({"page": min, "ids": list, "word": lambda l: [[ll] for ll in l]}).reset_index

  reference subject  page                                         ids              word
0     apple   fruit     1  [[bndv, asasa, swdsd], [bsnm, dfsd, dgdf]]    [[is], [text]]
1     apple    name     1                       [[aaaa, bbbbb, cccc]]           [[app]]
2       bat   place     1                           [[fsf, sff, fss]]          [[that]]
3       bat   thing     1      [[asas, ddfgd, ff], [sds, fsss, ssfd]]  [[sport], [was]]
4       cat   fruit     1                          [[sds, dffd, gdg]]         [[color]]
5       dog    name     2                          [[fffds, gd, sdg]]           [[mud]]
6       dog   thing     2                         [[dsfd, fds, gfdg]]           [[kit]]
7       egg   place     2          [[dfff, sdf, vcv], [ddd, fg, dfg]]    [[gun], [hut]]

Note that this also wraps each word value in a list, just like what you want in your desired output. I’m also just assuming to take the minimum page value in each group since you didn’t mention the rule for this variable. You can update the min value in agg function to whatever you think is appropriate.

Then you can cleanup the lists if length is 1:

res["word"] = res["word"].apply(lambda l: l[0] if len(l) == 1 else l)
res["ids"] = res["ids"].apply(lambda l: l[0] if len(l) == 1 else l)

  reference subject  page                                         ids              word
0     apple   fruit     1  [[bndv, asasa, swdsd], [bsnm, dfsd, dgdf]]    [[is], [text]]
1     apple    name     1                         [aaaa, bbbbb, cccc]             [app]
2       bat   place     1                             [fsf, sff, fss]            [that]
3       bat   thing     1      [[asas, ddfgd, ff], [sds, fsss, ssfd]]  [[sport], [was]]
4       cat   fruit     1                            [sds, dffd, gdg]           [color]
5       dog    name     2                            [fffds, gd, sdg]             [mud]
6       dog   thing     2                           [dsfd, fds, gfdg]             [kit]
7       egg   place     2          [[dfff, sdf, vcv], [ddd, fg, dfg]]    [[gun], [hut]]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading