I’m a beginner to Pandas, so bear with me.
Here is a simplified version of my series:
| Name |
|---|
| James |
| Michael |
| Jim |
| Bob |
| Jim |
| Bob |
I want to create a df that adds a column for ‘Team.’ Here is my team distribution:
team1 = [
'Michael',
'James',
]
team2 = [
'Jim',
'Bob'
]
My first instinct was to def func with an if statement and isin, like so:
def Team(row):
if row['Name'].isin(team1):
return 'Team 1'
elif row['Name'].isin(team2):
return 'Team 2'
else:
return 'No Team'
df['Team'] = df.apply(Team, axis=1)
df
With the axis, I get:
"TypeError: Teams() got an unexpected keyword argument ‘axis’"
When I remove the axis, I get:
"TypeError: string indices must be integers"
Any idea if there is a better approach? Thanks!
>Solution :
Not sure I understand your errors, but I see that the error also shows Teams(), instead of Team().
In any case, in your example, row is actually a pandas series, when you slice it, you get the actual strings, which does not have a method isin(). Changing your function definition should work:
def Team(row):
if row['Name'] in team1:
return 'Team 1'
elif row['Name'] in team2:
return 'Team 2'
else:
return 'No Team'
df['Team'] = df.apply(Team, axis=1)
df
Let me also suggest using directly the pandas series, instead of the whole dataframe. That should be faster as well. The .apply() method for series are similar to the ones in dataframes but you won’t need to pass the axis=1 argument.
def Team(name):
if name in team1:
return 'Team 1'
elif name in team2:
return 'Team 2'
else:
return 'No Team'
df['Team'] = df.Name.apply(Team)
df
Docs: