Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split dataframe string (when string can hold n values of that cell variable), into multiple columns

Currently working on a dataset with a lot of contact data, being Emails one of the variables.

A cell in the Emails column can have more than one email (1 to n) and they are all separated by a comma and a space.

For contacts with only two emails, the process would be quite straightforward. One can split the string and create a new column for that secondary email as follows

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

email_df[['Emails', 'SecondaryEmail']] = email_df['Emails'].str.split(', ', expand=True)

However this won’t work with more than 2 emails. Therefore, I wonder what is the most efficient way to split the emails when the number of emails can go from 1 to n (in this case the n is limited to around 10 but that won’t always be the case), into columns with only one email each (and different names each)?

>Solution :

Use Series.str.splitSeries.str.rsplit with DataFrame.pop for remove column Email after processing:

df = email_df.join(email_df.pop('Emails').str.split(', ', expand=True).add_prefix('Email'))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading