Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

convert pandas dataframe multi values column into separate rows

I am working with fifa dataset and there I have 2 columns player name and playing positions. The playing position column contains multiple values as players can play in multiple positions. It is a fifa dataset from kaggle

df = pd.read_csv('dataset.csv')
df = df['name', 'pos']
df.head()
index name pos
0 messi ST, RW, LW
1 lewa ST
2 ronaldo LW, ST

I want to convert that and map those to 4 separate positions for instance to Att, mid, def, gk .
I tried using replace and give it a dictionary with mapping but seems to only work with players that have only 1 position.
For instance messi has RW, LW, ST so my mapping doesnot work but someone like Lewandowski who has only ST it works.

Then I tried splitting the column and exploding it into different rows, and then replace , which made three 0 indexed rows with messi but that replaces only on first instance of index as well.
How do i solve this ?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1 = df[['name', 'pos']]
df1['pos'] = df['pos'].str.split(",")
df1 = df1.explode('pos')
new_pos = {'RW': 'ATT', 'LW': 'ATT', 'ST': 'ATT', 'CF': 'ATT', 'CAM': 'MID', 'CDM': 'MID', 'GK': 'GK', 'CM': 'MID' }
df1.replace({'pos': new_pos}, inplace=True)
df1.head()

below is how the df looks, there is no index column i have just added here to show to more clearly.

index name pos
0 messi ATT
0 messi ST
0 messi LW

Expected output:

index name pos
0 messi ATT
1 lewa ATT
2 ronaldo ATT

>Solution :

If you only want the first position, the best might be to use a list comprehension:

df['new_pos'] = [new_pos[next(iter(x.split(',')))] for x in df['pos']]

output:

   index     name         pos new_pos
0      0    messi  ST, RW, LW     ATT
1      1     lewa          ST     ATT
2      2  ronaldo      LW, ST     ATT

The (likely much less efficient) pandas equivalent would be:

df['new_pos'] = (df['pos']
                 .str.split(',')
                 .explode()
                 .map(new_pos)
                 .groupby(level=0).first()
                )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading