I have dataframe as follows (Name is index):
| Name | Age | year |
|---|---|---|
| Tom | 20 | 2020 |
| Tom | 20 | 2021 |
| Nick | 19 | 2019 |
| Jack | 18 | 2018 |
my goal is to remove duplicate and convert the column year to tuple or list, like below
| Name | Age | year |
|---|---|---|
| Tom | 20 | (2020, 2019) |
| Nick | 19 | 2019 |
| Jack | 18 | 2018 |
how can I do that efficiently since my df has more than 800,000 rows
>Solution :
Use np.unique on groupby. Assuming Name is already the index:
>>> df.groupby(level=0).agg(np.unique)
Age year
Name
Jack 18 2018
Nick 19 2019
Tom 20 [2020, 2021]