I have a dataframe like so:
Input:
Restaurant IP_Id IP_Time S_Code
McDonalds NaN 2021-03-28 03:01:05 4
McDonalds NaN 2021-03-28 03:01:05 5
McDonalds NaN 2021-03-28 03:01:05 6
McDonalds 101 2021-03-28 03:01:05 4
McDonalds 101 2021-03-28 03:01:05 5
Wendys 101 2021-03-28 03:01:05 4
Wendys 101 2021-03-28 03:01:05 4
I want to transform the dataframe to the following. For every instance where the IP_Id is the same for a specific restaurant at a specific time, I want to collapse the row and include a comma separated value for the S_Code.
Output:
Restaurant IP_Id IP_Time S_Code
McDonalds NaN 2021-03-28 03:01:05 4
McDonalds NaN 2021-03-28 03:01:05 5
McDonalds NaN 2021-03-28 03:01:05 6
McDonalds 101 2021-03-28 03:01:05 4, 5
Wendys 101 2021-03-28 03:01:05 4
I’ve tried grouping by Restaurant, IP_Id, and IP_Time, but it moves the records with Nan as the IP_Id
>Solution :
It’s actually quite simple. You’ll need a special condition to separate the NaNs into their own groups, which you can generate with a combination of shift and cumsum, and then call agg('unique'):
g = df.groupby([df['IP_Id'].ne(df['IP_Id'].shift(1)).cumsum(), df['Restaurant'], df['IP_Time']])
df = g['S_Code'].agg('unique').str.join(', ').reset_index().assign(IP_Id=g['IP_Id'].first().reset_index(drop=True))
Output:
>>> df
IP_Id Restaurant IP_Time S_Code
0 NaN McDonalds 2021-03-28 03:01:05 4
1 NaN McDonalds 2021-03-28 03:01:05 5
2 NaN McDonalds 2021-03-28 03:01:05 6
3 101.0 McDonalds 2021-03-28 03:01:05 4, 5
4 101.0 Wendys 2021-03-28 03:01:05 4