I’ve got two tables, for example
Table A with fields A,B (sorted by A count):
A,B
| Column A | Column B |
|---|---|
| foo1 | a |
| foo2 | b |
| foo3 | a |
| foo4 | d |
| foo5 | c |
| foo6 | a |
Table B (sorted by B count):
| Column B |
|---|
| a |
| b |
| c |
| d |
| e |
I want to get a table like this (to get top 2 most popular A for each B):
| Column A | Column B |
|---|---|
| foo1 | a |
| foo3 | a |
| foo2 | b |
| foo4 | d |
| foo5 | c |
I tried to do aggregate and groupby, but I have no ideas how to create this.
>Solution :
IIUC use ordered Categorical, then sorting by this column and get top2 values by GroupBy.head:
df['Column B'] = pd.Categorical(df['Column B'],
ordered=True,
categories=df['Column B'].unique())
If order is necessary use from another DataFrame use categories=df2['Column B']:
df['Column B'] = pd.Categorical(df['Column B'],
ordered=True,
categories=df2['Column B'])
df = df.sort_values('Column B').groupby('Column B').head(2)
print (df)
Column A Column B
0 foo1 a
2 foo3 a
1 foo2 b
3 foo4 d
4 foo5 c