Home Count how many times certain pandas row has specific column value lower than another certain pandas row across many pandas dataframes

Questions

Count how many times certain pandas row has specific column value lower than another certain pandas row across many pandas dataframes

byMR

July 21, 2023

I couldn’t find better title for the question and it may be confusing, I’ll try to explain it better with the example below.

I have the following dataframe with the possible combinations between four players:

players_combinations
 #   player_x   player_y
---  --------   --------
 0   player_1   player_2
 1   player_1   player_3
 2   player_1   player_4
 3   player_2   player_3
 4   player_2   player_4
 5   player_3   player_4

And I have the following dataframes with information about games between the players and their podium places:

game_1
 #   player     place
---  --------   -----
 0   player_1   2
 1   player_2   3
 2   player_3   1

game_2
 #   player     place
---  --------   -----
 0   player_2   2
 1   player_3   1

game_3
 #   player     place
---  --------   -----
 0   player_1   3
 1   player_2   1
 2   player_4   2

The problem is: I need to count how many times player_x has place value lower than player_y across the games dataframes. In another words, I would like to know how many times a player beat another based on their podium places.
For this example, I expect to get something like this:

result
 #   player_x   player_y   won        lost
---  --------   --------   --------   --------
 0   player_1   player_2   1          1
 1   player_1   player_3   null       1
 2   player_1   player_4   null       1
 3   player_2   player_3   null       2
 4   player_2   player_4   1          null
 5   player_3   player_4   null       null

This result dataframe tells that:

#0 row: player_1 beat player_2 one time, player_1 lost to player_2 one time.

#1 row: player_1 never beat player_3, player_1 lost to player_3 one time.

#2 row: player_1 never beat player_4, player_1 lost to player_4 one time.

#3 row: player_2 never beat player_3, player_2 lost to player_3 two times.

#4 row: player_2 beat player_4 one time, player_2 never lost to player_4.

#5 row: player_3 and player_4 never faced each other.

>Solution :

Combine itertools.combinations and collections.Counter on the data sorted by rank, then merge it with players_combinations:

from itertools import combinations
from collections import Counter

s = pd.Series(Counter(c for df in games for c in
                      combinations(df.sort_values(by='place')['player'], 2)))

out = (players_combinations
       .merge(s.rename_axis(['player_x', 'player_y']).reset_index(name='Won'),
              how='left')
       .merge(s.rename_axis(['player_y', 'player_x']).reset_index(name='Lost'),
              how='left')
      )

NB. this uses knowledge of the implementation of combinations that for ['A', 'B', 'C'] will produce a specific order: [('A', 'B'), ('A', 'C'), ('B', 'C')], always putting first in the combination an item that is ranked first in the original iterable.

Output:

   player_x  player_y  Won  Lost
0  player_1  player_2  1.0   1.0
1  player_1  player_3  NaN   1.0
2  player_1  player_4  NaN   1.0
3  player_2  player_3  NaN   2.0
4  player_2  player_4  1.0   NaN
5  player_3  player_4  NaN   NaN