Python copying rows with same ID values

September 10, 2023

I have a big dataframe with columns including ID and multiple values and different rows can have same or different ID values. I would like to create a new dataframe so, that every row has only one ID and the specific column values are just appended next to the ID. The Dataframe also has other columns with additional values that are same for same ID rows that i would like to keep

ID	type1	type2	value1	value2	value3
1	dog	yellow	1	2	3
1	dog	yellow	5	6	7
2	cat	brown	1	1	1
3	mouse	blue	1	1	1
1	dog	yellow	1	2	3

expected output:

ID	type1	type2	value
1	dog	yellow	1 2 3 5 6 7 1 2 3
2	cat	brown	1 1 1
3	mouse	blue	1 1 1

I have been exploring the groupby option, can’t get it to have this kind of output

>Solution :

You can melt and groupby.agg:

group = ['ID', 'type1', 'type2']
out = df.melt(group).groupby(group, as_index=False)['value'].agg(list)

Output:

   ID  type1   type2                        value
0   1    dog  yellow  [1, 5, 1, 2, 6, 2, 3, 7, 3]
1   2    cat   brown                    [1, 1, 1]
2   3  mouse    blue                    [1, 1, 1]

If order matters:

out = (df.set_index(group).stack().groupby(group).agg(list)
         .reset_index(name='value')
       )

Output:

   ID  type1   type2                        value
0   1    dog  yellow  [1, 2, 3, 5, 6, 7, 1, 2, 3]
1   2    cat   brown                    [1, 1, 1]
2   3  mouse    blue                    [1, 1, 1]