Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Transforming List of dicts to pandas dataframe with dynamic header columns

Suppose I have list of dicts

 my lists = [
    
    {'rank': 2, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B01MG0ORBL'
    },
    {'rank': 18, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B0735C9RDZ'
    },
    {'rank': 21, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B07FPVR858'
    },
    {'rank': 126, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B01MG0ORBL'
    },
    {'rank': 128, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B0735C9RDZ'
    },
    {'rank': 136, 'keyword_name': 'mens wallet', 'volume': 456677, 'asin': 'B07FPVR858'
    },
    {'rank': 19, 'keyword_name': 'leather wallets', 'volume': 23, 'asin': 'B0735C9RDZ'
    },
    {'rank': 10, 'keyword_name': 'wallets for men', 'volume': 566, 'asin': 'B07FPVR858'
    },
    {'rank': 16, 'keyword_name': 'wallets for men', 'volume': 566, 'asin': 'B0735C9RDZ'
    },
 ]

I want to group by asin and keyword_name since they appear more than once in the list of dicts, so my goal is to have a dataframe that looks like this

    **keyword_name     volume   B01MG0ORBL    B0735C9RDZ       B07FPVR858** // column headers

     mens wallet       456677    2 126         18 128 19 16    21 10
     leather wallets   23                                 
     wallets for men   566                      16              10
     

So initially I am thinking of

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   d = [{d['asin']:d['rank'] for d in l} for l in my_lists]
   pd.dataframe(d)

   
   // save as xlsx file
   writer = pd.ExcelWriter(f"{path}/sheet.xlsx", engine="xlsxwriter")
   d.to_excel(
        writer, sheet_name="Organic", startrow=0, header=True, index=False
    )

But not possible since it will ran into errors TypeError: string indices must be integers

Any help will be appreciated

Regards

>Solution :

You can create DataFrame and then pivoting with lists:

df = pd.DataFrame(my_lists)
    
df = df.pivot_table(index=['keyword_name','volume'], 
                    columns='asin', 
                    values='rank', 
                    aggfunc=list)
print (df)
asin                   B01MG0ORBL B0735C9RDZ B07FPVR858
keyword_name    volume                                 
leather wallets 23            NaN       [19]        NaN
mens wallet     456677   [2, 126]  [18, 128]  [21, 136]
wallets for men 566           NaN       [16]       [10]

Or join converted values to strings:

df = pd.DataFrame(my_lists)

df = (df.assign(rank=df['rank'].astype(str))
        .pivot_table(index=['keyword_name','volume'], 
                     columns='asin', 
                     values='rank', 
                     aggfunc=' '.join, 
                     fill_value=''))
print (df)
asin                   B01MG0ORBL B0735C9RDZ B07FPVR858
keyword_name    volume                                 
leather wallets 23                        19           
mens wallet     456677      2 126     18 128     21 136
wallets for men 566                       16         10
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading