I have a DataFrame of the form:
df = pd.DataFrame({
"key": ["a", "b", "b", "c", "c", "c"],
"var1": [1, 2, 3, 4, 5, 6],
"var2": ["x", "y", "x", "y", "x", "y"],
})
i.e.
key var1 var2
0 a 1 x
1 b 2 y
2 b 3 x
3 c 4 y
4 c 5 x
5 c 6 y
I am trying to transform the data to generate a dict with unique key values as top level keys, and a list of records for the other columns (var1 and var2)
Expected output:
{
"a": [{"var1": 1, "var2": "x"}],
"b": [{"var1": 2, "var2": "y"}, {"var1": 3, "var2": "x"}],
"c": [{"var1": 4, "var2": "y"}, {"var1": 5, "var2": "x"}, {"var1": 6, "var2": "y"}],
}
I tried using the code below, which works, but uses a for loop which makes it slow for large dataframes. How can I achieve the expected result in a more idiomatic way with pandas?
result = {}
for key in df["key"].unique():
key_df = df[df["key"] == key]
result[key] = key_df.drop("key", axis=1).to_dict(orient="records")
>Solution :
Use groupby and to_dict('records') in a dictionary comprehension:
out = {k: g.to_dict('records') for k, g in
df.set_index('key').groupby(level='key')}
Output:
{'a': [{'var1': 1, 'var2': 'x'}],
'b': [{'var1': 2, 'var2': 'y'}, {'var1': 3, 'var2': 'x'}],
'c': [{'var1': 4, 'var2': 'y'},
{'var1': 5, 'var2': 'x'},
{'var1': 6, 'var2': 'y'}]}