Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new column in df based on conditional with strings

I’m a beginner to Pandas, so bear with me.

Here is a simplified version of my series:

Name
James
Michael
Jim
Bob
Jim
Bob

I want to create a df that adds a column for ‘Team.’ Here is my team distribution:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

team1 = [
    'Michael',
    'James',
  ]
team2 = [
    'Jim',
    'Bob'
]

My first instinct was to def func with an if statement and isin, like so:

def Team(row):
    if row['Name'].isin(team1):
        return 'Team 1'
    elif row['Name'].isin(team2):
        return 'Team 2'
    else:
        return 'No Team'

df['Team'] = df.apply(Team, axis=1)
df

With the axis, I get:
"TypeError: Teams() got an unexpected keyword argument ‘axis’"
When I remove the axis, I get:
"TypeError: string indices must be integers"

Any idea if there is a better approach? Thanks!

>Solution :

Not sure I understand your errors, but I see that the error also shows Teams(), instead of Team().

In any case, in your example, row is actually a pandas series, when you slice it, you get the actual strings, which does not have a method isin(). Changing your function definition should work:

def Team(row):
    if row['Name'] in team1:
        return 'Team 1'
    elif row['Name'] in team2:
        return 'Team 2'
    else:
        return 'No Team'

df['Team'] = df.apply(Team, axis=1)
df

Let me also suggest using directly the pandas series, instead of the whole dataframe. That should be faster as well. The .apply() method for series are similar to the ones in dataframes but you won’t need to pass the axis=1 argument.

def Team(name):
    if name in team1:
        return 'Team 1'
    elif name in team2:
        return 'Team 2'
    else:
        return 'No Team'

df['Team'] = df.Name.apply(Team)
df

Docs:

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading