sorry, if my title sounds a bit confusing. What I’m basically trying to do is adding new rows in a data frame, where I duplicate the value of each unique value of one column, while another column’s new values are changing.
This is what my data frame looks like:
| id | year |
|---|---|
| 01 | 2022 |
| 02 | 2022 |
| 03 | 2022 |
| … | … |
| 99 | 2022 |
And I want it to look like this:
| id | year |
|---|---|
| 01 | 2022 |
| 01 | 2023 |
| 01 | 2024 |
| 02 | 2022 |
| 02 | 2023 |
| 02 | 2024 |
| 03 | 2022 |
| … | … |
| 99 | 2024 |
I.e. I want for every id to add the years 2023 and 2024 in the year column. I tried doing this with an apply function, but it always didn’t work out, could you guys help me out in solving this?
>Solution :
You can simply make a list comprehension and concat all dataframe years wirh increments of your desire. For example:
pd.concat([df.assign(year=df.year+increment) for increment in range(0,3)]).sort_values(by='id').reset_index(drop=True)
This will increment your dataframe to three years as follows. You can play around with range for the desired number of extensions:
| id | year |
|---|---|
| 1 | 2022 |
| 1 | 2023 |
| 1 | 2024 |
| 2 | 2022 |
| 2 | 2023 |
| 2 | 2024 |
| 3 | 2022 |
| 3 | 2023 |
| 3 | 2024 |