Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split Column of Semicolon-Separated Values and Duplicate Row with each Value in Pandas

I have a data frame with a row of data like this:

play_by_play = pd.DataFrame([{
    "players": "Tom Brady; Mike Evans; Tristan Wirfs; Leonard Fournette; Chris Godwin", 
               "down": 1, 
               "to_go": 10,
                "play_type": 'pass',
               "yards_gained": 8,
               "pass_attempt": 1,
               "complete_pass": 1,
               "rush_attempt": 0
              }])

I want to keep track of stats while the given players are on the field by using group by and aggregate functions. If I were looking to do this player-by-player, I would use play_by_play["players"].str.contains("Tom Brady") and aggregating the data using that filter, but I am looking to automate this. The solution I’ve landed on is to duplicate rows and have the "players" value split for each row. It would look something like this:

player down to_go play_type yards_gained pass_attempt complete_pass rush_attempt
"Tom Brady" 1 10 pass 8 1 1 0
"Mike Evans" 1 10 pass 8 1 1 0
"Tristan Wirfs" 1 10 pass 8 1 1 0
"Leonard Fournette" 1 10 pass 8 1 1 0
"Chris Godwin" 1 10 pass 8 1 1 0

How could I accomplish this? As I mentioned before, this needs to be pretty scalable, as there are thousands of rows of data. If there’s an easier way to group by based on unique values contained in a semicolon-separated column, I’m more than happy to go that route.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use pandas.Series.str.split to make a list of the players then pandas.DataFrame.explode to make multiple rows :

play_by_play['players'] = play_by_play['players'].str.split(';')
play_by_play = play_by_play.explode('players').reset_index(drop=True)

# Output :

print(play_by_play)

              players  down  to_go play_type  yards_gained  pass_attempt  complete_pass  rush_attempt
0           Tom Brady     1     10      pass             8             1              1             0
1          Mike Evans     1     10      pass             8             1              1             0
2       Tristan Wirfs     1     10      pass             8             1              1             0
3   Leonard Fournette     1     10      pass             8             1              1             0
4        Chris Godwin     1     10      pass             8             1              1             0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading