I have the following DF:
| Fecha | Partido | Equipo | xG | xGA |
|------------|-------------------------|---------|------|------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 |
| 2022-05-01 | Leon - América 2:0 | América | 1.70 | 0.35 |
I want to create three new columns based on the Partido column where the first team goes to a new column named Home, the second team to a column named Visitor and the score to a column named Score.
Desired Output:
| Fecha | Partido | Equipo | xG | xGA | Home | Away | Score |
|------------|-------------------------|---------|------|------|-------- |------------|-------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 | América | Cruz Azul | 0:0 |
| 2022-05-01 | Leon - América 2:0 | América | 1.70 | 0.35 | Leon | América | 2:0 |
I have tried splitting with delimiter but since some teams have two words in their names it doesn’t work.
>Solution :
It is quite simple using str.extract and a regex:
regex = r'([^-]+)\s*-\s*([^-]+) (\d+:\d+)'
df[['Home', 'Away', 'Score']] = df['Partido'].str.extract(regex)
output:
Fecha Partido Equipo xG xGA Home Away Score
0 2022-05-01 América - Cruz Azul 0:0 América 1.53 0.45 América Cruz Azul 0:0
1 2022-05-01 Leon - América 2:0 América 1.70 0.35 Leon América 2:0
If you don’t want to modify the original DataFrame, you can also use named capturing groups to directly set the column names:
regex = r'(?P<Home>[^-]+)\s*-\s*(?P<Away>[^-]+) (?P<Score>\d+:\d+)'
df2 = df['Partido'].str.extract(regex)
# Home Away Score
# 0 América Cruz Azul 0:0
# 1 Leon América 2:0
# OR
df2 = df.join(df['Partido'].str.extract(regex))
# same a first output