Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Plotting class decision boundary: determine a "good fit" range directly

I am trying to figure out how to plot the decision boundary line picking just few middle values instead of the entirety of the separator line such that it spans roughly the y-range that also the observations span.

Currently, I manually repeatedly select different bounds and assess visually, until "a good looking separator" emerged.

MWE:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


from collections import Counter
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import numpy as np
from sklearn.svm import SVC 

# sample data
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0,
    n_clusters_per_class=1, weights=[0.9], flip_y=0, random_state=1)

# fit svm model 
svc_model = SVC(kernel='linear', random_state=32)
svc_model.fit(X, y)

# Constructing a hyperplane using a formula.
w = svc_model.coef_[0]           
b = svc_model.intercept_[0]      
x_points = np.linspace(-1, 1)   
y_points = -(w[0] / w[1]) * x_points - b / w[1]  

Figure 1:

decision boundary line spans of a much larger range, causing the observations to be "squished" on what visually looks like almost a line

plt.figure(figsize=(10, 8))

# Plotting our two-features-space
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=50)

# Plotting a red hyperplane
plt.plot(x_points, y_points, c='r')

figure 1

Figure 2

Manually play around with points to determine good fit visually (x_points[19:-29], y_points[19:-29] ):

plt.figure(figsize=(10, 8))

# Plotting our two-features-space
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=50)
# Plotting a red hyperplane
plt.plot(x_points[19:-29], y_points[19:-29], c='r')

enter image description here

How can I automate "good fit" range of values? Foe example, this works fine with n_samples=100 data points but not with n_samples=1000.

>Solution :

Instead of having x_points go from -1 to 1, you could invert the linear equation and specify the bounds you want on y_points directly:

y_points = np.linspace(X[:, 1].min(), X[:, 1].max())
x_points = -(w[1] * y_points + b) / w[0]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading