Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split cells values and put them into different lists

I have a data frame and in one of the columns, some cells contain one value while others two values, and so on. The values are separated with ‘-‘. I want to take each value, depending on its place in the cell, and put it in a list.

For example:

import pandas as pd
  
df = pd.DataFrame()
  
print(df)
  
df['Name'] = ['Sam', 'Sam-Joe-Ron-Tania', 'Robert-Sam', 'Jack-Daniel-Sam-Joe-Billy-Robert','Billa']
df['IQ'] = [120, 100, 90, 80, 110]
df['Scores'] = [80, 75, 100, 77, 100]

df

I want to separate the names, so that for example, the first list would contain only the first names: ['Sam', 'Sam', 'Robert', 'Jack', 'Billa']

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

And the second list would have the second names in order : ['Joe', 'Sam', 'Daniel']

How can I do that? Thanks!

>Solution :

new columns

Use regex with str.extract:

df[['First', 'Second']] = df['Name'].str.extract('([^-]+)(?:-([^-]+))?')

or with a subset of str.split (split is interesting if you have more than 2 names to extract, else prefer extract that will be quite more efficient):

N = 2 # number of names to extract
# adapt the assignment below to the number of columns
df[['First', 'Second']] = df['Name'].str.split('-', expand=True, n=N)[range(N)]

output:

                               Name   IQ  Scores   First  Second
0                               Sam  120      80     Sam     NaN
1                 Sam-Joe-Ron-Tania  100      75     Sam     Joe
2                        Robert-Sam   90     100  Robert     Sam
3  Jack-Daniel-Sam-Joe-Billy-Robert   80      77    Jack  Daniel
4                             Billa  110     100   Billa     NaN

python lists

If really you want lists:

d = df['Name'].str.extract('([^-]+)(?:-([^-]+))?')

l1 = d[0].dropna().to_list()
# ['Sam', 'Sam', 'Robert', 'Jack', 'Billa']

l2 = d[1].dropna().to_list()
# ['Joe', 'Sam', 'Daniel']

Or in one command:

l1, l2 = (df['Name'].str.extract('([^-]+)(?:-([^-]+))?')
          .apply(lambda s: s.dropna().to_list())
         )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading