Home How to get the closest the column value by the bin Multiindex?

Questions

How to get the closest the column value by the bin Multiindex?

December 22, 2021

I have a DataFrame (df_value_bin) with Multiindex which is the binned values, like this:

import pandas as pd
import numpy as np

np.random.seed(100)
df = pd.DataFrame(np.random.randn(100, 3), columns=['a', 'b', 'value'])

a_bins = np.arange(-3, 4, 1)
b_bins = np.arange(-2, 4, 2)

df['a_bins'] = pd.cut(df['a'], bins=a_bins)
df['b_bins'] = pd.cut(df['b'], bins=b_bins)
df_value_bin = df.groupby(['a_bins','b_bins']).agg({'value':'mean'})

Here’s the quickview of df_value_bin:

                     value
a_bins   b_bins           
(-3, -2] (-2, 0] -0.417606
         (0, 2]  -0.267035
(-2, -1] (-2, 0] -0.296727
         (0, 2]  -0.112280
(-1, 0]  (-2, 0]  0.459780
         (0, 2]   0.131588
(0, 1]   (-2, 0]  0.110268
         (0, 2]   0.287755
(1, 2]   (-2, 0]  0.254337
         (0, 2]  -0.627460
(2, 3]   (-2, 0] -0.075165
         (0, 2]  -0.589709

Then, I want to get the closest value of df_value_bin when giving some a and b.

Let’s say a=1.5 and b=-1, then we should get value=0.254337.

Attempt 1

I can generate the boolean mask for a_bins and b_bins:

a_test = 1.5
b_test = -1

boolean_a = df_value_bin.index.get_level_values('a_bins').categories.contains(a_test)
boolean_b = df_value_bin.index.get_level_values('b_bins').categories.contains(b_test)

print(boolean_a, boolean_b) # Output: [False False False False  True False] [ True False]

However, I have no idea of using the masks to select the row …

Attempt 2

I can get the index directly:

index_a = np.digitize(a_test, a_bins, right=True)
index_b = np.digitize(b_test, b_bins, right=True)

print(index_a, index_b) # Output: 5 1

Again, I don’t know how to use the index to select the row directly.

Notes

It seems the second method should be quicker as it uses np.digitize().
If you have any idea to complete it or other better methods, please feel free to answer!

>Solution :

You can just index with numbers in this case:

df_value_bin.loc[(1.5, -1)]

Output (ignore value, which is generated randomly, look at the Name):

value    0.047439
Name: ((1, 2], (-2, 0]), dtype: float64

dataframe

byMR

Published December 22, 2021

Add a comment

Open A File From Dialog Box

byMR

December 22, 2021

Questions

subprocess.call path problem , how to fix?

byMR

December 22, 2021

Questions

Swift Collection API: extend with a computed variable: type error

byMR

December 22, 2021

Questions

generate a birthday date randomly in c

byMR

December 22, 2021

Questions

Sending input to GPIOC

byMR

December 22, 2021

Questions

How can I get the date inside string that contains a text?

byMR

December 22, 2021

How to get the closest the column value by the bin Multiindex?

MEDevel.com: Open-source for Healthcare and Education

Attempt 1

Attempt 2

Notes

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Open A File From Dialog Box

subprocess.call path problem , how to fix?

Swift Collection API: extend with a computed variable: type error

generate a birthday date randomly in c

Sending input to GPIOC

How can I get the date inside string that contains a text?

Keep Up to Date with the Most Important News

How to get the closest the column value by the bin Multiindex?

MEDevel.com: Open-source for Healthcare and Education

Attempt 1

Attempt 2

Notes

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Open A File From Dialog Box

subprocess.call path problem , how to fix?

Swift Collection API: extend with a computed variable: type error

generate a birthday date randomly in c

Sending input to GPIOC

How can I get the date inside string that contains a text?

Discover more from Dev solutions