I couldn’t find better title for the question and it may be confusing, I’ll try to explain it better with the example below.
I have the following dataframe with the possible combinations between four players:
players_combinations
# player_x player_y
--- -------- --------
0 player_1 player_2
1 player_1 player_3
2 player_1 player_4
3 player_2 player_3
4 player_2 player_4
5 player_3 player_4
And I have the following dataframes with information about games between the players and their podium places:
game_1
# player place
--- -------- -----
0 player_1 2
1 player_2 3
2 player_3 1
game_2
# player place
--- -------- -----
0 player_2 2
1 player_3 1
game_3
# player place
--- -------- -----
0 player_1 3
1 player_2 1
2 player_4 2
The problem is: I need to count how many times player_x has place value lower than player_y across the games dataframes. In another words, I would like to know how many times a player beat another based on their podium places.
For this example, I expect to get something like this:
result
# player_x player_y won lost
--- -------- -------- -------- --------
0 player_1 player_2 1 1
1 player_1 player_3 null 1
2 player_1 player_4 null 1
3 player_2 player_3 null 2
4 player_2 player_4 1 null
5 player_3 player_4 null null
This result dataframe tells that:
#0 row: player_1 beat player_2 one time, player_1 lost to player_2 one time.
#1 row: player_1 never beat player_3, player_1 lost to player_3 one time.
#2 row: player_1 never beat player_4, player_1 lost to player_4 one time.
#3 row: player_2 never beat player_3, player_2 lost to player_3 two times.
#4 row: player_2 beat player_4 one time, player_2 never lost to player_4.
#5 row: player_3 and player_4 never faced each other.
>Solution :
Combine itertools.combinations and collections.Counter on the data sorted by rank, then merge it with players_combinations:
from itertools import combinations
from collections import Counter
s = pd.Series(Counter(c for df in games for c in
combinations(df.sort_values(by='place')['player'], 2)))
out = (players_combinations
.merge(s.rename_axis(['player_x', 'player_y']).reset_index(name='Won'),
how='left')
.merge(s.rename_axis(['player_y', 'player_x']).reset_index(name='Lost'),
how='left')
)
NB. this uses knowledge of the implementation of combinations that for ['A', 'B', 'C'] will produce a specific order: [('A', 'B'), ('A', 'C'), ('B', 'C')], always putting first in the combination an item that is ranked first in the original iterable.
Output:
player_x player_y Won Lost
0 player_1 player_2 1.0 1.0
1 player_1 player_3 NaN 1.0
2 player_1 player_4 NaN 1.0
3 player_2 player_3 NaN 2.0
4 player_2 player_4 1.0 NaN
5 player_3 player_4 NaN NaN