Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extracting vectors from a dataframe using a Boolean matrix

I have a dataframe that looks something like this:

df = pd.DataFrame(np.random.rand(10,3), columns = ['col1', 'col2', 'col3'])
col1 col2 col3
0 0.692154 0.286560 0.515904
1 0.798917 0.777593 0.971300

and I have another matrix which is a Boolean matrix that looks something like this:

b_matrix = pd.DataFrame(np.array([[0,1,1],
                       [1,1,0],
                       [0,0,1],
                       [0,1,0]]),                                             
                       columns = ['col1', 'col2', 'col3'],
                       index = ['input1', 'input2', 'input3', 'input4'])
col1 col2 col3
input1 0 1 1
input2 1 1 0
input3 0 0 1
input4 0 1 0

So the idea here is that I will provide some input, this will be checked against b_matrix and then I will have returned to me only the corresponding columns of df. e.g. if the input is input1 then the output will be df[['col2', 'col3']]:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

col2 col3
0 0.286560 0.515904
1 0.777593 0.971300

I can think of a way to do this with keeping a static list of the column names to check over each time but I was wondering if there was a more direct method?

>Solution :

You can use input to take user input and df.loc:

In [1076]: inp = input('User input:')
User input:input1

In [1077]: df[b_matrix.columns[b_matrix.loc[inp].eq(1)]]
Out[1077]: 
       col2      col3
0  0.179902  0.832655
1  0.444187  0.487146
2  0.879333  0.756792
3  0.870601  0.661337
4  0.082169  0.008669
5  0.190734  0.975966
6  0.839718  0.290976
7  0.862724  0.426222
8  0.581909  0.333300
9  0.949953  0.539106

If you choose input2:

In [1080]: inp = input('User input:')
User input:input2

In [1081]: df[b_matrix.columns[b_matrix.loc[inp].eq(1)]]
Out[1081]: 
       col1      col2
0  0.072600  0.179902
1  0.126708  0.444187
2  0.646533  0.879333
3  0.673643  0.870601
4  0.313205  0.082169
5  0.951917  0.190734
6  0.076799  0.839718
7  0.294087  0.862724
8  0.240569  0.581909
9  0.851999  0.949953
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading