Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

OverflowError: cannot convert float infinity to integer, after doing so

import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
import quantstats as qs

data = pd.read_csv('worldometer_data.csv')
X = data.drop(columns=['Country/Region', 'Continent', 'Population', 'WHO Region'])

# replace NaN values with 0
for i in X:
  X[i] = X[i].fillna(0)
  
# getting rid of float infinity
X = X.replace([np.inf, -np.inf, -0], 0)

wcss = []

# getting Kmeans
for i in range(0, 51):
  kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
  kmeans.fit(X)
  wcss.append(kmeans.inertia_)

# visualizing the kmeans graph
plt.plot(range(0, 51), wcss)
plt.title('Elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

I have checked the X array after getting rid of float infinity and it does that successfully . But when it gets to kmeans.fit(X) it fails and returns the OverflowError.

Error:

OverflowError                             Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_21416\3473011824.py in <module>
     16 for i in range(0, 51):
     17   kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
---> 18   kmeans.fit(X)
     19   wcss.append(kmeans.inertia_)
     20 

c:\Users\usr\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py in fit(self, X, y, sample_weight)
   1177         for i in range(self._n_init):
   1178             # Initialize centers
-> 1179             centers_init = self._init_centroids(
   1180                 X, x_squared_norms=x_squared_norms, init=init, random_state=random_state
   1181             )

c:\Users\usr\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py in _init_centroids(self, X, x_squared_norms, init, random_state, init_size)
   1088 
   1089         if isinstance(init, str) and init == "k-means++":
-> 1090             centers, _ = _kmeans_plusplus(
   1091                 X,
   1092                 n_clusters,

c:\Users\usr\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py in _kmeans_plusplus(X, n_clusters, x_squared_norms, random_state, n_local_trials)
    189         # specific results for other than mentioning in the conclusion
...
--> 191         n_local_trials = 2 + int(np.log(n_clusters))
    192 
    193     # Pick first center randomly and track index of point

OverflowError: cannot convert float infinity to integer

How can I fix this and is there something else I did wrong?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Dataset used: https://www.kaggle.com/datasets/imdevskp/corona-virus-report (the worldometer_data.csv)

>Solution :

The problem is here:

for i in range(0, 51):
  kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)

You cannot set n_clusers to 0. It must be larger than 0.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading