Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split and store dataframe but name based on unique values in specific cols

I have a dataframe like as below

data = pd.DataFrame({'email_id': ['abc@gmail.com;test1@gmail.com','abc@gmail.com;def@yahoo.com','abdc@gmail.com','ache@gmail.com','aqce@gmail.com','pqr@gmail.com','pqr@gmail.com'],
             'Dept_id': [21,23,25,26,28,29,31],
             'dept_name':['Science','Chemistry','Maths','Social','Physics','Botany','Zoology'],
             'location':['KOR','ASN','ANZ','IND','AUS','NZ','NZ']})

I would like to do the below

a) Split the dataframe based the unique email_id values in the email_id column

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

b) Store the split data in a excel file. each email_id row will have one file.

c) Name the excel file based on corresponding unique values present in dept_id, dept_name and location columns

I tried the below

for k, v in data.groupby(['email_id']):
    dept_unique_ids = v['Dept_id'].unique()
    dept_unique_names = v['dept_name'].unique()
    location_unique = v['location'].unique()
    writer = pd.ExcelWriter(f"{k}.xlsx", engine='xlsxwriter')
    v.to_excel(writer,columns=col_list,sheet_name=f'{k}',index=False, startrow = 1)

While the above code splits the file successfully but it names it based on the email_id which is used as key. Instead I want to name the file based on dept_id, dept_name and location for that specific key.

For ex: If you take email_id = pqr@gmail.com, they have two unique dept_ids which is 29 and 31, their unique dept_name is Botany and Zoology and unique location is NZ.

So, I want my file name to be 29_31_Botany_Zoology_NZ.xlsx.

Therefore, I expect my output files (for each unique email id row to have filenames like below)

enter image description here

update – error message

enter image description here

>Solution :

IIUC, you can use:

for k, v in data.groupby(['email_id']):
    dept_unique_ids = '_'.join(v['Dept_id'].astype(str).unique())
    dept_unique_names = '_'.join(v['dept_name'].unique())
    location_unique = '_'.join(v['location'].unique())
    filename = '_'.join([dept_unique_ids, dept_unique_names, location_unique]) + '.xlsx'
    print(filename)
    # The rest of your code here

Output:

23_Chemistry_ASN.xlsx
21_Science_KOR.xlsx
25_Maths_ANZ.xlsx
26_Social_IND.xlsx
28_Physics_AUS.xlsx
29_31_Botany_Zoology_NZ.xlsx
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading