Home python dict flatten nested dict values to create new key value pair in numpy/pandas

Questions

python dict flatten nested dict values to create new key value pair in numpy/pandas

August 10, 2023

Hi I have data in below structure where I have a map of key as label and value as array of array and I want to flatten the values and dynamically add index to the key to create a new row like below. I can iterate over each key-value pain and create new dict and add these values to it and get the expected result but its slow. I have around 50M values in array, is there a faster approach in numpy/pandas?

This is what I have

{'user_feature': 
array([
[ 1.33677050e-02, -1.45685431e-02], 
[-2.30765194e-02, 0.00000000e+00],
[0.00000000e+00,  0.00000000e+00],  
[1.16669689e-04,  1.33677050e-02]]), 
'sequence_service_id_list': 
array([y
[215., 215., 215., ..., 554., 215., 215.],
[215., 215., 215., ..., 215., 215., 215.],
[215., 215., 554., ..., 215., 215., 215.], 
'target_label': 
array([
1., 
1., 
1., ..., 1., 1., 1.])}

Expected:

{'user_feature_1': [ 1.33677050e-02, -1.45685431e-02], 
'user_feature_2': [-2.30765194e-02, 0.00000000e+00],
'user_feature_3': [0.00000000e+00,  0.00000000e+00],
'sequence_service_id_list_1': [215., 215., 215., ..., 554., 215., 215.],
'sequence_service_id_list_2': [215., 215., 215., ..., 215., 215., 215.],
'sequence_service_id_list_3': [215., 215., 554., ..., 215., 215., 215.], 
'target_label_1': 1., 
'target_label_2': 1., 
'target_label_3': 1., 
}

>Solution :

This isn’t a vectorized solution to create the dict you want, but a way to access the required rows using keys that follow the new format.

Let’s define a class to wrap this input dictionary. When you try to get a key from an object of this class, the __getitem__ method is invoked, where the key is parsed into its "original key" and "index" components, and the appropriate row of the appropriate value is returned.

class CustomDict:
    def __init__(self, input_dict):
        self.__data = input_dict

    def __getitem__(self, key):
        orig_key, elem_index = key.rsplit("_", 1)
        return self.__data[orig_key][int(elem_index)-1]

Let’s test this:

array = np.array

inp_dict = {'user_feature': array([[ 1.33677050e-02, -1.45685431e-02], 
                                   [-2.30765194e-02, 0.00000000e+00],
                                   [0.00000000e+00,  0.00000000e+00],  
                                   [1.16669689e-04,  1.33677050e-02]]), 
            'sequence_service_id_list': array([[215., 215., 215., 554., 215., 215.],
                                               [215., 215., 215., 215., 215., 215.],
                                               [215., 215., 554., 215., 215., 215.]]), 
            'target_label': array([1., 1., 1., 1., 1., 1.])}

cus_dict = CustomDict(inp_dict)

print(cus_dict['user_feature_1'])
# [ 0.01336771 -0.01456854]

print(cus_dict['user_feature_2'])
# [-0.02307652  0.        ]

print(cus_dict['user_feature_3'])
# [0. 0.]

Since you never iterate over anything, and splitting the key is a simple, quick operation that happens at the time of access, this will be much faster than creating a new dictionary.

You can also implement a similar __setitem__ method to set elements of the original dictionary:

def __setitem__(self, key, value):
    orig_key, elem_index = key.rsplit("_", 1)
    self.__data[orig_key][elem_index] = value

numpy-ndarray

byMR

Published August 10, 2023

Add a comment

Converting JSON list to dictionary

byMR

August 10, 2023

Questions

QProcess Doesn't terminate program

byMR

August 10, 2023

Questions

conditional statement for media query on tailwindcss

byMR

August 10, 2023

Questions

I've a logical problem, and I want to solve it with optimised way

byMR

August 10, 2023

Questions

What's the best way to initialize a dynamically allocated boolean 2d array?

byMR

August 10, 2023

Questions

C# return IWebElement (selenium) causes NullReferenceException

byMR

August 10, 2023

python dict flatten nested dict values to create new key value pair in numpy/pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Converting JSON list to dictionary

QProcess Doesn't terminate program

conditional statement for media query on tailwindcss

I've a logical problem, and I want to solve it with optimised way

What's the best way to initialize a dynamically allocated boolean 2d array?

C# return IWebElement (selenium) causes NullReferenceException

Keep Up to Date with the Most Important News

python dict flatten nested dict values to create new key value pair in numpy/pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Converting JSON list to dictionary

QProcess Doesn't terminate program

conditional statement for media query on tailwindcss

I've a logical problem, and I want to solve it with optimised way

What's the best way to initialize a dynamically allocated boolean 2d array?

C# return IWebElement (selenium) causes NullReferenceException

Discover more from Dev solutions