I have a list of tuples of different lenghts, where the tuples can be thought to encode teams of people, such as:
data = [('Alice',),
('Bob', 'Betty'),
('Charlie', 'Cindy', 'Cramer')]
From this, I would like to create a DataFrame with a column of team member names, and a column with the size of the team they were on:
name teamsize
0 Alice 1
1 Bob 2
2 Betty 2
3 Charlie 3
4 Cindy 3
5 Cramer 3
I have tried my hand at some double for loops, but I couldn’t not get things to work out, and have the impression that it is not a very good way to go about it. Any tips would be appreciated.
>Solution :
Use a list comprehension and the DataFrame constructor:
out = pd.DataFrame([[name, len(l)] for l in data for name in l],
columns=['name', 'teamsize'])
Output:
name teamsize
0 Alice 1
1 Bob 2
2 Betty 2
3 Charlie 3
4 Cindy 3
5 Cramer 3
For fun here is a pure pandas solution (but likely less efficient!):
out = (pd.DataFrame({'name': data})
.assign(teamsize=lambda d: d['name'].str.len())
.explode('name', ignore_index=True)
)