Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

List of tuples to DataFrame w. column for elements, column for tuple length

I have a list of tuples of different lenghts, where the tuples can be thought to encode teams of people, such as:

data = [('Alice',),
        ('Bob', 'Betty'),
        ('Charlie', 'Cindy', 'Cramer')]

From this, I would like to create a DataFrame with a column of team member names, and a column with the size of the team they were on:

   name     teamsize
0  Alice    1
1  Bob      2
2  Betty    2
3  Charlie  3
4  Cindy    3
5  Cramer   3

I have tried my hand at some double for loops, but I couldn’t not get things to work out, and have the impression that it is not a very good way to go about it. Any tips would be appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use a list comprehension and the DataFrame constructor:

out = pd.DataFrame([[name, len(l)] for l in data for name in l],
                   columns=['name', 'teamsize'])

Output:

      name  teamsize
0    Alice         1
1      Bob         2
2    Betty         2
3  Charlie         3
4    Cindy         3
5   Cramer         3

For fun here is a pure pandas solution (but likely less efficient!):

out = (pd.DataFrame({'name': data})
         .assign(teamsize=lambda d: d['name'].str.len())
         .explode('name', ignore_index=True)
      )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading