Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Count how many times certain pandas row has specific column value lower than another certain pandas row across many pandas dataframes

I couldn’t find better title for the question and it may be confusing, I’ll try to explain it better with the example below.

I have the following dataframe with the possible combinations between four players:

players_combinations
 #   player_x   player_y
---  --------   --------
 0   player_1   player_2
 1   player_1   player_3
 2   player_1   player_4
 3   player_2   player_3
 4   player_2   player_4
 5   player_3   player_4

And I have the following dataframes with information about games between the players and their podium places:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

game_1
 #   player     place
---  --------   -----
 0   player_1   2
 1   player_2   3
 2   player_3   1

game_2
 #   player     place
---  --------   -----
 0   player_2   2
 1   player_3   1

game_3
 #   player     place
---  --------   -----
 0   player_1   3
 1   player_2   1
 2   player_4   2

The problem is: I need to count how many times player_x has place value lower than player_y across the games dataframes. In another words, I would like to know how many times a player beat another based on their podium places.
For this example, I expect to get something like this:

result
 #   player_x   player_y   won        lost
---  --------   --------   --------   --------
 0   player_1   player_2   1          1
 1   player_1   player_3   null       1
 2   player_1   player_4   null       1
 3   player_2   player_3   null       2
 4   player_2   player_4   1          null
 5   player_3   player_4   null       null

This result dataframe tells that:

#0 row: player_1 beat player_2 one time, player_1 lost to player_2 one time.

#1 row: player_1 never beat player_3, player_1 lost to player_3 one time.

#2 row: player_1 never beat player_4, player_1 lost to player_4 one time.

#3 row: player_2 never beat player_3, player_2 lost to player_3 two times.

#4 row: player_2 beat player_4 one time, player_2 never lost to player_4.

#5 row: player_3 and player_4 never faced each other.

>Solution :

Combine itertools.combinations and collections.Counter on the data sorted by rank, then merge it with players_combinations:

from itertools import combinations
from collections import Counter

s = pd.Series(Counter(c for df in games for c in
                      combinations(df.sort_values(by='place')['player'], 2)))

out = (players_combinations
       .merge(s.rename_axis(['player_x', 'player_y']).reset_index(name='Won'),
              how='left')
       .merge(s.rename_axis(['player_y', 'player_x']).reset_index(name='Lost'),
              how='left')
      )

NB. this uses knowledge of the implementation of combinations that for ['A', 'B', 'C'] will produce a specific order: [('A', 'B'), ('A', 'C'), ('B', 'C')], always putting first in the combination an item that is ranked first in the original iterable.

Output:

   player_x  player_y  Won  Lost
0  player_1  player_2  1.0   1.0
1  player_1  player_3  NaN   1.0
2  player_1  player_4  NaN   1.0
3  player_2  player_3  NaN   2.0
4  player_2  player_4  1.0   NaN
5  player_3  player_4  NaN   NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading