Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find the sum of squares for each cluster? in data using scale function and build the K-means

Consider the dataset “USArrests.csv”. This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.

Variables Description

  • States:: The state where the incident occurred
  • Murder: No. of arrests for murder (per 100,000 residents)
  • Assault: No. of arrests for assault (per 100,000 residents)
  • UrbanPop: Percentage of urban population
  • Rape: Rape arrests (per 100,000 residents)

Set the column States as index of the data frame while reading the data. Set the random number generator to set.seed(123). Normalize the data using scale function and build the K-means algorithm with the given conditions:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • number of clusters = 4
  • nstart=20

According to the built model, the within cluster sum of squares for each cluster is __
(the order of values in each option could be different)

I am stuck after importing the data and setting the seed. Struggling to fit and build a K-means algorithm.

>Solution :

I am happy in someway after looking at the dataset. If I am not wrong this dataset is taken from Kaggle

Anyways using R to execute the code here, hope you are familiar with the same. If not the concept would be very similar. Try to understand and re-write the code in your comfortable coding language.

After all the necessary formalities and import

data=read.csv("USArrests.csv", header=T, row.names = "States")
df <- scale(data)
set.seed(123)
fit<-kmeans(df, centers=4, nstart=20)
print(fit$withinss)

The output would be exactly 8.316061 11.952463 16.212213 19.922437

Feel free to comment if you don’t understand or find a mistake.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading