I have Two data frames
import pandas as pd
exam_1 = pd.DataFrame({'user': ['A', 'B', 'C'],
'marks': [10, 50, 40]})
exam_2 = pd.DataFrame({'user': ['A', 'C', 'D'],
'marks': [30, 20, 30]})
I’m trying to get a resultant data frame, with users appeared in both exams with lowest marks.
In the above example result should be:
user marks
A 10
C 30
I did write following lines to get users who appeared in both exams but, stuck in getting the lowest marks. Here is what I have tried
tmp = pd.merge(exam_1, exam_2, how='inner', on=['user'])
USERS = exam_1[ exam_1['user'].isin(tmp['user'])]
can someone please help me to get lowest exam marks in the resultant dataframe?
>Solution :
One option could be to set "user" as index, then use numpy.minimum and dropna:
import numpy as np
out = np.minimum(exam_1.set_index('user'),
exam_2.set_index('user')).dropna()
Output:
marks
user
A 10.0
C 20.0
With your merge approach:
out = (pd.merge(exam_1, exam_2, on='user')
.set_index('user')
.filter(like='marks')
.min(axis=1)
)
Output:
user
A 10
C 20
dtype: int64
Another, more verbose, option with concat:
common = set(exam_1['user']) & set(exam_2['user'])
m1 = exam_1['user'].isin(common)
m2 = exam_2['user'].isin(common)
out = (pd.concat([exam_1[m1], exam_2[m2]])
.groupby('user', as_index=False).min()
)
Output:
user marks
0 A 10
1 C 20