Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Repeating list in column does not work

I want to turn a dataframe from thisenter image description here

to this:

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

It took me a while to figure out the melt and transpose function to get to this

enter image description here

But I did not get to manage to apply the years from 1990 to 2019 in a repeating manner into for every of the 189 countries.

I tried:

 year_list = []
for year in range(1990, 2020,1):
    year_list.append(year)
years = pd.Series(year_list)
years

and then

df['year'] = years.repeat(30)

(I need to repeat it 30 times, because the frame consists of 5670 rows = 189 countries * 29 years)

I got this error message:

ValueError: cannot reindex on an axis with duplicate labels

Googling this error does not help.

>Solution :

One approach could be as follows:

Sample data

import pandas as pd
import numpy as np

data = {'country': ['Afghanistan','Angola']}

data.update({k: np.random.rand() for k in range(1990,1993)})

df = pd.DataFrame(data)

print(df)

       country      1990      1991      1992
0  Afghanistan  0.103589  0.950523  0.323925
1       Angola  0.103589  0.950523  0.323925

Code

res = (df.set_index('country')
       .unstack()
       .sort_index(level=1)
       .reset_index(drop=False)
       .rename(columns={'country': 'geo', 
                        'level_0': 'time', 
                        0: 'hdi_human_development_index'})
       )

print(res)

   time          geo  hdi_human_development_index
0  1990  Afghanistan                     0.103589
1  1991  Afghanistan                     0.950523
2  1992  Afghanistan                     0.323925
3  1990       Angola                     0.103589
4  1991       Angola                     0.950523
5  1992       Angola                     0.323925

Explanation

  • Use df.set_index on column country and apply df.unstack to add the years from the column names to the index.
  • Now, we use df.sort_index on level=1 to get the countries in alphabetical order.
  • Finally, we use df.reset_index with drop parameter set to False to get the index back as columns, and we chain df.rename to customize the column names.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading