pandas to_dict – list of rows by key

February 19, 2024

I have a DataFrame of the form:

df = pd.DataFrame({
    "key": ["a", "b", "b", "c", "c", "c"],
    "var1": [1, 2, 3, 4, 5, 6],
    "var2": ["x", "y", "x", "y", "x", "y"],
})

i.e.

  key  var1 var2
0   a     1    x
1   b     2    y
2   b     3    x
3   c     4    y
4   c     5    x
5   c     6    y

I am trying to transform the data to generate a dict with unique key values as top level keys, and a list of records for the other columns (var1 and var2)

Expected output:

{
    "a": [{"var1": 1, "var2": "x"}],
    "b": [{"var1": 2, "var2": "y"}, {"var1": 3, "var2": "x"}],
    "c": [{"var1": 4, "var2": "y"}, {"var1": 5, "var2": "x"}, {"var1": 6, "var2": "y"}],
}

I tried using the code below, which works, but uses a for loop which makes it slow for large dataframes. How can I achieve the expected result in a more idiomatic way with pandas?

result = {}
for key in df["key"].unique():
    key_df = df[df["key"] == key]
    result[key] = key_df.drop("key", axis=1).to_dict(orient="records")

>Solution :

Use groupby and to_dict('records') in a dictionary comprehension:

out = {k: g.to_dict('records') for k, g in 
       df.set_index('key').groupby(level='key')}

Output:

{'a': [{'var1': 1, 'var2': 'x'}],
 'b': [{'var1': 2, 'var2': 'y'}, {'var1': 3, 'var2': 'x'}],
 'c': [{'var1': 4, 'var2': 'y'},
       {'var1': 5, 'var2': 'x'},
       {'var1': 6, 'var2': 'y'}]}