Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas window aggregation two sorted tables

I’ve got two tables, for example

Table A with fields A,B (sorted by A count):
A,B

Column A Column B
foo1 a
foo2 b
foo3 a
foo4 d
foo5 c
foo6 a

Table B (sorted by B count):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Column B
a
b
c
d
e

I want to get a table like this (to get top 2 most popular A for each B):

Column A Column B
foo1 a
foo3 a
foo2 b
foo4 d
foo5 c

I tried to do aggregate and groupby, but I have no ideas how to create this.

>Solution :

IIUC use ordered Categorical, then sorting by this column and get top2 values by GroupBy.head:

df['Column B'] = pd.Categorical(df['Column B'], 
                                ordered=True, 
                                categories=df['Column B'].unique())

If order is necessary use from another DataFrame use categories=df2['Column B']:

df['Column B'] = pd.Categorical(df['Column B'], 
                                ordered=True, 
                                categories=df2['Column B'])

df = df.sort_values('Column B').groupby('Column B').head(2)
print (df)
  Column A Column B
0     foo1        a
2     foo3        a
1     foo2        b
3     foo4        d
4     foo5        c
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading