First of all, sorry if the title is hard to understand.
Target:
I’m trying to group by source, type columns, add the min,max columns for each group by result
,and then add the related target column to both min and max columns (in front of the value).
I can’t figure out how to get the Pandas result in this format:
| source | type | min | max |
|---|---|---|---|
| Person1 | bow | Person 2: 0.001 | Person 3: 0.05 |
I have a list of dictionaries as followed:
`[{'source': 'Person1', 'target': 'Person2', 'type': 'bow', 'similarity': 0.636}, {'source': 'Person1', 'target': 'Person2', 'type': 'bigram', 'similarity': 0.040}, {'source': 'Person1', 'target': 'Person2', 'type': 'tfidf', 'similarity': 0.433}, {'source': 'Person1', 'target': 'Person3', 'type': 'bow', 'similarity': 0.699}, {'source': 'Person1', 'target': 'Person3', 'type': 'bigram', 'similarity': 0.171}, {'source': 'Person1', 'target': 'Person3', 'type': 'tfidf', 'similarity': 0.522}]`
In table from this looks like:
| source | target | type | similarity |
|---|---|---|---|
| Person1 | Person2 | bow | 0.636 |
| Person1 | Person2 | bigram | 0.040 |
| Person1 | Person2 | tfidf | 0.433 |
| Person1 | Person3 | bow | 0.699 |
| Person1 | Person3 | bigram | 0.171 |
| Person1 | Person3 | tfidf | 0.522 |
For the group by, min/max I’m using the following:
df = df.groupby(['source','type']).similarity.agg(['min','max'])
Which results into:
| source | type | min | max |
|---|---|---|---|
| Person1 | bow | 0.636 | 0.699 |
| Person1 | bigram | 0.040 | 0.171 |
| Person1 | tfidf | 0.433 | 0.522 |
All good to this point, but how do I get the output into the following structure:
[Source]:source;[Type]: type; [min]: Target:min(similarity); [max]: Target:max(similarity)
| source | type | min | max |
|---|---|---|---|
| Person1 | bow | Person2: 0.636 | Person3: 0.699 |
| Person1 | bigram | Person2: 0.040 | Person3: 0.171 |
| Person1 | tfidf | Person3: 0.433 | Person3: 0.522 |
Am I supposed to use .loc to find the row for the min / max values and then somehow add those to the result?
>Solution :
Here is an approach with GroupBy and pandas.merge :
g = df.groupby(by=['source', 'type'], sort=False)
out = (
pd.merge(df.loc[g['similarity'].idxmin()]
.rename(columns= {'similarity': 'sim_min', 'target': 'target_min'}),
df.loc[g['similarity'].idxmax()]
.rename(columns= {'similarity': 'sim_max', 'target': 'target_max'}),
on=['source','type'])
.assign(min=lambda x: x.pop('target_min') + ': ' + x.pop('sim_min').astype(str),
max=lambda x: x.pop('target_max') + ': ' + x.pop('sim_max').astype(str))
)
Another variant :
g = df.groupby(by=['source', 'type'], sort=False)
out = (
pd.merge(df.loc[g['similarity'].idxmin()]
.assign(min= lambda x: x[['target', 'similarity']]
.astype(str).agg(": ".join, axis=1)),
df.loc[g['similarity'].idxmax()]
.assign(max= lambda x: x[['target', 'similarity']]
.astype(str).agg(": ".join, axis=1)),
on=['source','type'], suffixes=('', '_'))
.loc[:, ['source', 'type', 'min', 'max']]
)
# Output :
print(out)
source type min max
0 Person1 bow Person2: 0.636 Person3: 0.699
1 Person1 bigram Person2: 0.04 Person3: 0.171
2 Person1 tfidf Person2: 0.433 Person3: 0.522