I have a data frame and in one of the columns, some cells contain one value while others two values, and so on. The values are separated with ‘-‘. I want to take each value, depending on its place in the cell, and put it in a list.
For example:
import pandas as pd
df = pd.DataFrame()
print(df)
df['Name'] = ['Sam', 'Sam-Joe-Ron-Tania', 'Robert-Sam', 'Jack-Daniel-Sam-Joe-Billy-Robert','Billa']
df['IQ'] = [120, 100, 90, 80, 110]
df['Scores'] = [80, 75, 100, 77, 100]
df
I want to separate the names, so that for example, the first list would contain only the first names: ['Sam', 'Sam', 'Robert', 'Jack', 'Billa']
And the second list would have the second names in order : ['Joe', 'Sam', 'Daniel']
How can I do that? Thanks!
>Solution :
new columns
Use regex with str.extract:
df[['First', 'Second']] = df['Name'].str.extract('([^-]+)(?:-([^-]+))?')
or with a subset of str.split (split is interesting if you have more than 2 names to extract, else prefer extract that will be quite more efficient):
N = 2 # number of names to extract
# adapt the assignment below to the number of columns
df[['First', 'Second']] = df['Name'].str.split('-', expand=True, n=N)[range(N)]
output:
Name IQ Scores First Second
0 Sam 120 80 Sam NaN
1 Sam-Joe-Ron-Tania 100 75 Sam Joe
2 Robert-Sam 90 100 Robert Sam
3 Jack-Daniel-Sam-Joe-Billy-Robert 80 77 Jack Daniel
4 Billa 110 100 Billa NaN
python lists
If really you want lists:
d = df['Name'].str.extract('([^-]+)(?:-([^-]+))?')
l1 = d[0].dropna().to_list()
# ['Sam', 'Sam', 'Robert', 'Jack', 'Billa']
l2 = d[1].dropna().to_list()
# ['Joe', 'Sam', 'Daniel']
Or in one command:
l1, l2 = (df['Name'].str.extract('([^-]+)(?:-([^-]+))?')
.apply(lambda s: s.dropna().to_list())
)