Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Removing rows and columns if all zeros in non-diagonal entries

I am generating a confusion matrix to get an idea on my text-classifier‘s prediction vs ground-truth. The purpose is to understand which intents are being predicted as some another intents. But the problem is I have too many classes (more than 160), so the matrix is sparse, where most of the fields are zeros. Obviously, the diagonal elements are likely to be non-zero, as it is basically the indication of correct prediction.

That being the case, I want to generate a simpler version of it, as we only care non-zero elements if they are non-diagonal, hence, I want to remove the rows and columns where all the elements are zeros (ignoring the diagonal entries), such that the graph becomes much smaller and manageable to view. How to do that?

Following is the code snippet that I have done so far, it will produce mapping for all the intents i.e, (#intent, #intent) dimensional plot.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import matplotlib.pyplot as plt
import numpy as np 
from pandas import DataFrame
import seaborn as sns
%matplotlib inline
sns.set(rc={'figure.figsize':(64,64)})

confusion_matrix = pd.crosstab(df['ground_truth_intent_name'], df['predicted_intent_name'])

variables = sorted(list(set(df['ground_truth_intent_name'])))
temp = DataFrame(confusion_matrix, index=variables, columns=variables)

sns.heatmap(temp, annot=True)

TL;DR

Here confusion_matrix is a pandas dataframe. I need to remove all rows and columns where all elements are zeros (ignoring the diagonal elements, even if they are not zero).

>Solution :

You can use any on the comparison, but first you need to fill the diagonal with 0:

# also consider using
# a = np.isclose(confusion_matrix.to_numpy(), 0)
a = confusion_matrix.to_numpy() != 0

# fill diagonal
np.fill_diagonal(a, False)

# columns with at least one non-zero
cols = a.any(axis=0)

# rows with at least one non-zero
rows = a.any(axis=1)

# boolean indexing
confusion_matrix.loc[rows, cols]

Let’s take an example:

# random data
np.random.seed(1)
# this would agree with the above
a = np.random.randint(0,2, (5,5))
a[2] = 0
a[:-1,-1] = 0
confusion_matrix = pd.DataFrame(a)

So the data would be:

   0  1  2  3  4
0  1  1  0  0  0
1  1  1  1  1  0
2  0  0  0  0  0
3  0  0  1  0  0
4  0  1  0  0  1

and the code outputs (notice the 2nd row and 4th column are gone):

   0  1  2  3
0  1  1  0  0
1  1  1  1  1
3  0  0  1  0
4  0  1  0  0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading