Home Split and store dataframe but name based on unique values in specific cols

Questions

Split and store dataframe but name based on unique values in specific cols

July 31, 2022

I have a dataframe like as below

data = pd.DataFrame({'email_id': ['abc@gmail.com;test1@gmail.com','abc@gmail.com;def@yahoo.com','abdc@gmail.com','ache@gmail.com','aqce@gmail.com','pqr@gmail.com','pqr@gmail.com'],
             'Dept_id': [21,23,25,26,28,29,31],
             'dept_name':['Science','Chemistry','Maths','Social','Physics','Botany','Zoology'],
             'location':['KOR','ASN','ANZ','IND','AUS','NZ','NZ']})

I would like to do the below

a) Split the dataframe based the unique email_id values in the email_id column

b) Store the split data in a excel file. each email_id row will have one file.

c) Name the excel file based on corresponding unique values present in dept_id, dept_name and location columns

I tried the below

for k, v in data.groupby(['email_id']):
    dept_unique_ids = v['Dept_id'].unique()
    dept_unique_names = v['dept_name'].unique()
    location_unique = v['location'].unique()
    writer = pd.ExcelWriter(f"{k}.xlsx", engine='xlsxwriter')
    v.to_excel(writer,columns=col_list,sheet_name=f'{k}',index=False, startrow = 1)

While the above code splits the file successfully but it names it based on the email_id which is used as key. Instead I want to name the file based on dept_id, dept_name and location for that specific key.

For ex: If you take email_id = pqr@gmail.com, they have two unique dept_ids which is 29 and 31, their unique dept_name is Botany and Zoology and unique location is NZ.

So, I want my file name to be 29_31_Botany_Zoology_NZ.xlsx.

Therefore, I expect my output files (for each unique email id row to have filenames like below)

update – error message

>Solution :

IIUC, you can use:

for k, v in data.groupby(['email_id']):
    dept_unique_ids = '_'.join(v['Dept_id'].astype(str).unique())
    dept_unique_names = '_'.join(v['dept_name'].unique())
    location_unique = '_'.join(v['location'].unique())
    filename = '_'.join([dept_unique_ids, dept_unique_names, location_unique]) + '.xlsx'
    print(filename)
    # The rest of your code here

Output:

23_Chemistry_ASN.xlsx
21_Science_KOR.xlsx
25_Maths_ANZ.xlsx
26_Social_IND.xlsx
28_Physics_AUS.xlsx
29_31_Botany_Zoology_NZ.xlsx

group-by

byMR

Published July 31, 2022

Add a comment

Capture Substring from list elements

byMR

July 31, 2022

Questions

Get excel data in order using python

byMR

July 31, 2022

Questions

How to add eventlistener when user leaves input

byMR

July 31, 2022

Questions

Filter groups with more than n members with `dplyr`

byMR

July 31, 2022

Questions

apply shade to flutter color variable

byMR

July 31, 2022

Questions

Change the input initial value of controlled component in React

byMR

July 31, 2022

Split and store dataframe but name based on unique values in specific cols

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Capture Substring from list elements

Get excel data in order using python

How to add eventlistener when user leaves input

Filter groups with more than n members with `dplyr`

apply shade to flutter color variable

Change the input initial value of controlled component in React

Keep Up to Date with the Most Important News

Split and store dataframe but name based on unique values in specific cols

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Capture Substring from list elements

Get excel data in order using python

How to add eventlistener when user leaves input

Filter groups with more than n members with `dplyr`

apply shade to flutter color variable

Change the input initial value of controlled component in React

Discover more from Dev solutions