Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create nested dictionary based on column values from Pandas dataframe?

I have a dataframe

df1 = pd.DataFrame(data={'col1': [21, 44, 28, 32, 20, 39, 42], 
                         'col2': ['<1', '>2', '>2', '>3', '<1', '>2', '>4'], 
                         'col3': ['yes', 'yes', 'no', 'no', 'yes', 'no', 'yes'], 
                         'col4': [1, 1, 0, 0, 1, 0, 1], 
                         'Group': [0, 2, 1, 1, 0, 1, 2] })
   col1 col2 col3  col4  Group
0    21   <1  yes     1      0
1    44   >2  yes     1      2
2    28   >2   no     0      1
3    32   >3   no     0      1
4    20   <1  yes     1      0
5    39   >2   no     0      1
6    42   >4  yes     1      2

Based on ‘Group’ column values 0,1,2 I need to create a nested dictionary like

{0: {'col1': {'mean': 20.5, 'sd': 0.5},
  'col2': ['<1'],
  'col3': ['yes'],
  'col4': {'mean': 1, 'sd': 0}},
 1: {'col1': {'mean': 33, 'sd': 4.54},
  'col2': ['>2', '>3'],
  'col3': ['no'],
  'col4': {'mean': 0, 'sd': 0}},
 2: {'col1': {'mean': 43, 'sd': 1},
  'col2': ['>2', '>4'],
  'col3': ['yes'],
  'col4': {'mean': 1, 'sd': 0}}}

col1 and col4 are numeric, col2 and col3 are string

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For Group 0:

‘col1’ : mean and stddev of values 20,21

‘col2’ : which has only ‘<1’

‘col3’ : has only ‘yes’

‘col4’ : mean and stddev of values 1,1

For Group 1:

‘col1’ : mean and stddev of values 28,32,39

‘col2’ : which has ‘>2’ and ‘>3’

‘col3’ : has only ‘no’

‘col4’ : mean and stddev of values 0,0,0

For Group 2:

‘col1’ : mean and stddev of values 42,44

‘col2’ : which has ‘>2’ and ‘>4’

‘col3’ : has only ‘yes’

‘col4’ : mean and stddev of values 1,1

>Solution :

df1.groupby('Group').agg(lambda x: x.unique().tolist() if x.dtype =='object' 
                         else {'mean':x.mean(), 'std':x.std(ddof=0)}).T.to_dict()

Out[396]: 
{0: {'col1': {'mean': 20.5, 'std': 0.5},
  'col2': ['<1'],
  'col3': ['yes'],
  'col4': {'mean': 1.0, 'std': 0.0}},
 1: {'col1': {'mean': 33.0, 'std': 4.546060565661952},
  'col2': ['>2', '>3'],
  'col3': ['no'],
  'col4': {'mean': 0.0, 'std': 0.0}},
 2: {'col1': {'mean': 43.0, 'std': 1.0},
  'col2': ['>2', '>4'],
  'col3': ['yes'],
  'col4': {'mean': 1.0, 'std': 0.0}}}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading