Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using Pandas, want to group by multiple columns for min/max and add another column value to min/max columns

First of all, sorry if the title is hard to understand.

Target:
I’m trying to group by source, type columns, add the min,max columns for each group by result
,and then add the related target column to both min and max columns (in front of the value).

I can’t figure out how to get the Pandas result in this format:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

source type min max
Person1 bow Person 2: 0.001 Person 3: 0.05

I have a list of dictionaries as followed:

`[{'source': 'Person1', 'target': 'Person2', 'type': 'bow', 'similarity': 0.636}, {'source': 'Person1', 'target': 'Person2', 'type': 'bigram', 'similarity': 0.040}, {'source': 'Person1', 'target': 'Person2', 'type': 'tfidf', 'similarity': 0.433}, {'source': 'Person1', 'target': 'Person3', 'type': 'bow', 'similarity': 0.699}, {'source': 'Person1', 'target': 'Person3', 'type': 'bigram', 'similarity': 0.171}, {'source': 'Person1', 'target': 'Person3', 'type': 'tfidf', 'similarity': 0.522}]`

In table from this looks like:

source target type similarity
Person1 Person2 bow 0.636
Person1 Person2 bigram 0.040
Person1 Person2 tfidf 0.433
Person1 Person3 bow 0.699
Person1 Person3 bigram 0.171
Person1 Person3 tfidf 0.522

For the group by, min/max I’m using the following:

df = df.groupby(['source','type']).similarity.agg(['min','max'])

Which results into:

source type min max
Person1 bow 0.636 0.699
Person1 bigram 0.040 0.171
Person1 tfidf 0.433 0.522

All good to this point, but how do I get the output into the following structure:

[Source]:source;[Type]: type; [min]: Target:min(similarity); [max]: Target:max(similarity)

source type min max
Person1 bow Person2: 0.636 Person3: 0.699
Person1 bigram Person2: 0.040 Person3: 0.171
Person1 tfidf Person3: 0.433 Person3: 0.522

Am I supposed to use .loc to find the row for the min / max values and then somehow add those to the result?

>Solution :

Here is an approach with GroupBy and pandas.merge :

g = df.groupby(by=['source', 'type'], sort=False)
​​
out = (
            pd.merge(df.loc[g['similarity'].idxmin()]
                       .rename(columns= {'similarity': 'sim_min', 'target': 'target_min'}),
                     df.loc[g['similarity'].idxmax()]
                       .rename(columns= {'similarity': 'sim_max', 'target': 'target_max'}),
                     on=['source','type'])
              .assign(min=lambda x: x.pop('target_min') + ': ' + x.pop('sim_min').astype(str),
                      max=lambda x: x.pop('target_max') + ': ' + x.pop('sim_max').astype(str))
        )

Another variant :

g = df.groupby(by=['source', 'type'], sort=False)
​
​
out = (
            pd.merge(df.loc[g['similarity'].idxmin()]
                         .assign(min= lambda x: x[['target', 'similarity']]
                                                     .astype(str).agg(": ".join, axis=1)),
                     df.loc[g['similarity'].idxmax()]
                         .assign(max= lambda x: x[['target', 'similarity']]
                                                     .astype(str).agg(": ".join, axis=1)),
                     on=['source','type'], suffixes=('', '_'))
                .loc[:, ['source', 'type', 'min', 'max']]
      )

# Output :

print(out)

    source    type             min             max
0  Person1     bow  Person2: 0.636  Person3: 0.699
1  Person1  bigram   Person2: 0.04  Person3: 0.171
2  Person1   tfidf  Person2: 0.433  Person3: 0.522
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading