As an output of my analysis, I get a dictionary containing the measurements for each sample. I would like to have these in a dataframe with a row for each sample (thus the dictionary). The dictionary for each sample has the same keys. Is there a way to efficiently add each dictionary as a row to a dataframe?
sample_1 = {"area": 2, "perimeter": 3, "diameter": 5}
sample_2 = {"area": 6, "perimeter": 3, "diameter": 8}
I want to combine these in a dataframe. The columns should be area, perimeter and diameter, and the rows should be the samples. I have over 5000 samples and 20 variables stored in the dictionaries.
I have tried the function pd.DataFrame.from_dict but this would result in having to turn each dictionary in a dataframe that then had to be merged.
I cannot change the output of the function I use to measure to a dataframe, so the dictionaries is what I have to work with.
>Solution :
Combine all your samples in a list:
sample_1 = {'area': 2, 'perimeter': 3, 'diameter': 5}
sample_2 = {'area': 6, 'perimeter': 3, 'diameter': 8}
samples = [sample_1, sample_2]
out = pd.DataFrame(samples)
If you can, it’s even better to drop the intermediate variable names:
samples = [{'area': 2, 'perimeter': 3, 'diameter': 5},
{'area': 6, 'perimeter': 3, 'diameter': 8},
]
out = pd.DataFrame(samples)
Output:
area perimeter diameter
0 2 3 5
1 6 3 8
If your samples have meaningful names:
sample_1 = {'area': 2, 'perimeter': 3, 'diameter': 5}
sample_2 = {'area': 6, 'perimeter': 3, 'diameter': 8}
samples = {'A': sample_1, 'B': sample_2}
out = pd.DataFrame.from_dict(samples, orient='index')
Output:
area perimeter diameter
A 2 3 5
B 6 3 8