Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create Pandas dataframe from list which contains dictionary of dictionary

I have a list of dictionaries that each contain dictionary key:value pairs as the value – see below:

d = [{'line': {'Area Boundary Must Be Covered By Boundary Of': '10', 'Must Be Inside': '55', 'Must Not Have Gaps': '2', 'Must Not Self-Intersect': '2', 'Must Not Self-Overlap': '2'}},
     {'point': {'Must Not Self-Intersect': '3'}}, 
     {'poly': {'Must Not Overlap': '2'}}]

The desired dataframe form would be:

desired dataframe form

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’ve been creating test dataframes for a bit, and can’t seem to wrangle it to the form above.

Some notes – these errors will be dynamic. Meaning, this script will run at a weekly interval, and as the source data changes, so will the errors. The only constants will be the type geometry (i.e ‘line’ ‘point’ and ‘poly’).

edit:

d = [{'line': {'Area Boundary Must Be Covered By Boundary Of': '10', 'Must Be Inside': '55', 'Must Not Have Gaps': '2', 'Must Not Self-Intersect': '2', 'Must Not Self-Overlap': '2'}},
         {'point': {'Must Not Self-Intersect': '3'}}, 
         {'poly': {'Must Not Overlap': '2'}}]
df = pandas.concat([pandas.DataFrame(x) for x in d])
print(df)

Produces:

                                             line point poly
Area Boundary Must Be Covered By Boundary Of   10   NaN  NaN
Must Be Inside                                 55   NaN  NaN
Must Not Have Gaps                              2   NaN  NaN
Must Not Self-Intersect                         2   NaN  NaN
Must Not Self-Overlap                           2   NaN  NaN
Must Not Self-Intersect                       NaN     3  NaN
Must Not Overlap                              NaN   NaN    2

This will suffice.

>Solution :

You can create a dataframe for each sub-object, concat them, and then compress the duplicates together with groupby(level=0) (for 0th index level) + sum:

df = pd.concat([pd.DataFrame(o) for o in objects]).groupby(level=0).sum().T

Output:

>>> df
      Area Boundary Must Be Covered By Boundary Of Must Be Inside Must Not Have Gaps Must Not Overlap Must Not Self-Intersect Must Not Self-Overlap
line                                            10             55                  2                0                       2                     2
point                                            0              0                  0                0                       3                     0
poly                                             0              0                  0                2                       0                     0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading