Home Using Pandas, want to group by multiple columns for min/max and add another column value to min/max columns

Questions

Using Pandas, want to group by multiple columns for min/max and add another column value to min/max columns

December 11, 2022

First of all, sorry if the title is hard to understand.

Target:
I’m trying to group by source, type columns, add the min,max columns for each group by result
,and then add the related target column to both min and max columns (in front of the value).

I can’t figure out how to get the Pandas result in this format:

source	type	min	max
Person1	bow	Person 2: 0.001	Person 3: 0.05

I have a list of dictionaries as followed:

`[{'source': 'Person1', 'target': 'Person2', 'type': 'bow', 'similarity': 0.636}, {'source': 'Person1', 'target': 'Person2', 'type': 'bigram', 'similarity': 0.040}, {'source': 'Person1', 'target': 'Person2', 'type': 'tfidf', 'similarity': 0.433}, {'source': 'Person1', 'target': 'Person3', 'type': 'bow', 'similarity': 0.699}, {'source': 'Person1', 'target': 'Person3', 'type': 'bigram', 'similarity': 0.171}, {'source': 'Person1', 'target': 'Person3', 'type': 'tfidf', 'similarity': 0.522}]`

In table from this looks like:

source	target	type	similarity
Person1	Person2	bow	0.636
Person1	Person2	bigram	0.040
Person1	Person2	tfidf	0.433
Person1	Person3	bow	0.699
Person1	Person3	bigram	0.171
Person1	Person3	tfidf	0.522

For the group by, min/max I’m using the following:

df = df.groupby(['source','type']).similarity.agg(['min','max'])

Which results into:

source	type	min	max
Person1	bow	0.636	0.699
Person1	bigram	0.040	0.171
Person1	tfidf	0.433	0.522

All good to this point, but how do I get the output into the following structure:

[Source]:source;[Type]: type; [min]: Target:min(similarity); [max]: Target:max(similarity)

source	type	min	max
Person1	bow	Person2: 0.636	Person3: 0.699
Person1	bigram	Person2: 0.040	Person3: 0.171
Person1	tfidf	Person3: 0.433	Person3: 0.522

Am I supposed to use .loc to find the row for the min / max values and then somehow add those to the result?

>Solution :

Here is an approach with GroupBy and pandas.merge :

g = df.groupby(by=['source', 'type'], sort=False)

out = (
            pd.merge(df.loc[g['similarity'].idxmin()]
                       .rename(columns= {'similarity': 'sim_min', 'target': 'target_min'}),
                     df.loc[g['similarity'].idxmax()]
                       .rename(columns= {'similarity': 'sim_max', 'target': 'target_max'}),
                     on=['source','type'])
              .assign(min=lambda x: x.pop('target_min') + ': ' + x.pop('sim_min').astype(str),
                      max=lambda x: x.pop('target_max') + ': ' + x.pop('sim_max').astype(str))
        )

Another variant :

g = df.groupby(by=['source', 'type'], sort=False)


out = (
            pd.merge(df.loc[g['similarity'].idxmin()]
                         .assign(min= lambda x: x[['target', 'similarity']]
                                                     .astype(str).agg(": ".join, axis=1)),
                     df.loc[g['similarity'].idxmax()]
                         .assign(max= lambda x: x[['target', 'similarity']]
                                                     .astype(str).agg(": ".join, axis=1)),
                     on=['source','type'], suffixes=('', '_'))
                .loc[:, ['source', 'type', 'min', 'max']]
      )

# Output :

print(out)

    source    type             min             max
0  Person1     bow  Person2: 0.636  Person3: 0.699
1  Person1  bigram   Person2: 0.04  Person3: 0.171
2  Person1   tfidf  Person2: 0.433  Person3: 0.522