I ran into a problem to be able to combine two columns before generating a unique list.
CSV file:
country,half,uniqueTournament
Brazil,1st half,Serie A
England,1st half,Championship
Argentina,2nd half,Primera Liga
Brazil,1st half,Serie A
My attempt:
import pandas as pd
csv_file = '@@@@@@@@@@@@@'
df = pd.read_csv(csv_file)
df.loc[(df['half'] == '1st half'), 'country' + ' - ' + 'uniqueTournament'].unique()
Expected outcome:
Brazil - Serie A
England - Championship
>Solution :
If df was like:
country half uniqueTournament
0 Brazil 1st half Serie A
1 England 1st half Championship
2 Argentina 1st half Primera Liga
3 Brazil 1st half Serie A
4 Brazil 2nd half Serie A
then you could create a new column, then groupby + agg(list):
df['new'] = df['country'] + ' - '+ df['uniqueTournament']
df.drop_duplicates(subset=['half','new']).groupby('half')['new'].agg(list).tolist()
or you could use groupby + unique:
out = df.groupby('half')['new'].unique().tolist()
Output:
[['Brazil - Serie A', 'England - Championship', 'Argentina - Primera Liga'],
['Brazil - Serie A']]