Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Groupby and Apply Functions on multiple Columns with 1-to-many relationship

I am trying to consolidate a one-to-many relationship in my dataset while creating a link for each, and sometimes multiple items. For a given Ticket, there can be 1 or many Work Orders, so I have used the following to consolidate and list Work Orders in a single column:

df = df.groupby(['Date', 'Ticket ID', 'Score', 'many other Columns...'])['Work Order'].apply(', '.join).reset_index()

Which gives me a nice output where Work Orders are either listed individually or comma separted like so:

Date                        Ticket ID           Work Order
2018-08-30 22:52:25         1444008             119846184
2021-09-29 13:33:49         1724734             122445397, 122441551

Now, I want to create links for the Work Orders as well that are also comma delimited. For the single Work Orders the following works just fine:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

woLink = r'www.google.com/woNum='
df['WO Link'] = woLink + df['Work Order'].astype(str)

However for the multiple it essentially undoes the groupby and re-adds the row that I grouped by. From the sample data above like:

# Wrong
Date                        Ticket ID           Work Order           Link
2018-08-30 22:52:25         1444008             119846184            google.com/woNum=119846184
2021-09-29 13:33:49         1724734             122445397            google.com/woNum=122445397
2021-09-29 13:33:49         1724734             122441551            google.com/woNum=122441551

What I want is:

Date                        Ticket ID           Work Order           Link (s)
2018-08-30 22:52:25         1444008             119846184            google.com/woNum=119846184
2021-09-29 13:33:49         1724734             122445397, 122441551 google.com/woNum=122445397, google.com/woNum=122441551

I have tried several different variations of Groupby/apply like regrouping without success:

df = df.groupby(['Date', 'Ticket ID', 'Score', 'many other Columns...'])['Link'].apply(', '.join).reset_index()

It always seems to duplicate the row again – I know this is because WO and WOLink both have a 1 to Many, but I cannot seem to figure how to handle both at once.

How can I groupby and consolidate this dataframe?

>Solution :

You can use a regex to add the URL part:

woLink = r'example.org/woNum='
df['Link'] = df['Work Order'].str.replace('(\d+)', rf'{woLink}\1')

output:

                  Date  Ticket ID            Work Order                                                      Link
0  2018-08-30 22:52:25    1444008             119846184                               example.org/woNum=119846184
1  2021-09-29 13:33:49    1724734  122445397, 122441551  example.org/woNum=122445397, example.org/woNum=122441551
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading