Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Generate new columns based on string condition

I have the following DF:

| Fecha      | Partido                 | Equipo  |  xG  |  xGA |
|------------|-------------------------|---------|------|------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 |
| 2022-05-01 | Leon - América 2:0      | América | 1.70 | 0.35 |

I want to create three new columns based on the Partido column where the first team goes to a new column named Home, the second team to a column named Visitor and the score to a column named Score.

Desired Output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

| Fecha      | Partido                 | Equipo  |  xG  |  xGA | Home    | Away       | Score |
|------------|-------------------------|---------|------|------|-------- |------------|-------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 | América | Cruz Azul  | 0:0   |
| 2022-05-01 | Leon - América 2:0      | América | 1.70 | 0.35 | Leon    | América    | 2:0   |

I have tried splitting with delimiter but since some teams have two words in their names it doesn’t work.

>Solution :

It is quite simple using str.extract and a regex:

regex = r'([^-]+)\s*-\s*([^-]+) (\d+:\d+)'
df[['Home', 'Away', 'Score']] = df['Partido'].str.extract(regex)

output:

        Fecha                  Partido   Equipo    xG   xGA      Home       Away Score
0  2022-05-01  América - Cruz Azul 0:0  América  1.53  0.45  América   Cruz Azul   0:0
1  2022-05-01       Leon - América 2:0  América  1.70  0.35     Leon     América   2:0

regex demo

If you don’t want to modify the original DataFrame, you can also use named capturing groups to directly set the column names:

regex = r'(?P<Home>[^-]+)\s*-\s*(?P<Away>[^-]+) (?P<Score>\d+:\d+)'
df2 = df['Partido'].str.extract(regex)

#        Home       Away Score
# 0  América   Cruz Azul   0:0
# 1     Leon     América   2:0

# OR
df2 = df.join(df['Partido'].str.extract(regex))

# same a first output
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading