Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Masking data frame with multidimensional key

I have a data frame containing value_1 and value_2

df_1 = pd.DataFrame(
    {
        "id_1": [101, 202],
        "id_2": [101, 202],
        "value_1": [5.0, 10.0],
        "value_2": [10.0, 4.0],
    }
)
df_1 = df_1.set_index(["id_1", "id_2"])

that looks like this:

           value_1  value_2
id_1 id_2
101  101       5.0     10.0
202  202      10.0      4.0

I have another data frame, that contains a flag for each value, i.e. is_active_1 and is_active_2:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_2 = pd.DataFrame(
    {
        "id_1": [101, 202],
        "id_2": [101, 202],
        "is_active_1": [True, False],
        "is_active_2": [False, False],
    }
)
df_2 = df_2.set_index(["id_1", "id_2"])

that looks like this:

           is_active_1  is_active_2
id_1 id_2
101  101          True        False
202  202         False        False

I want to multiply the value rows by *3 in df_1 depending on its flag in df_2. The end result should like this:

           value_1  value_2
id_1 id_2
101  101      15.0     10.0
202  202      10.0      4.0

i.e. the is_active_1 = True flag for (id_1, id_2) = (101, 101) causes value_1 -> 3 * 5.0 = 15.0

I have tried the following:

df_1.loc[df_2[["is_active_1", "is_active_2"]], ["value_1", "value_2"]] * 3

but ended up with a value error ValueError: Cannot index with multidimensional key.

>Solution :

By decreasing level of index alignment.

You can rename the columns to replace is_active by name:

df_1[df_2.rename(columns=lambda x: x.replace('is_active', 'value'))] *= 3

Or, you can use set_axis to avoid index alignment on the columns:

df_1[df_2.set_axis(df_1.columns, axis=1)] *= 3

Or assume the two dataframes are aligned and ignore completely the labels of df_2:

df_1[df_2.to_numpy()] *= 3

Updated df_1:

           value_1  value_2
id_1 id_2                  
101  101      15.0     10.0
202  202      10.0      4.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading