Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Mutating a list of strings to include K-means cluster number in R

I am following this tutorial and this is all super new to me, so apologies if it’s an obvious question.

Following the tutorial, I have converted a list of strings, sentences, into a K-Means model with 4 clusters using the following:

corpus = tm::Corpus(tm::VectorSource(sentences)) 
tdm <- tm::DocumentTermMatrix(corpus.cleaned) 
tdm.tfidf <- tm::weightTfIdf(tdm)
tdm.tfidf <- tm::removeSparseTerms(tdm.tfidf, 0.999) 
tfidf.matrix <- as.matrix(tdm.tfidf) 
dist.matrix = proxy::dist(tfidf.matrix, method = "cosine")
model <- kmeans(dist.matrix, centers = 4)

Now, I would like to go back to the original list of sentences and show next to each one which cluster it forms part of. For example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

sentences cluster
Lorem ipsum dolor sit amet 1
Consectetur adipiscing elit 2

I’ve tried the following (using the dplyr package):

clustered <- mutate(sentences, cluster = model$cluster)

and

clustered <- mutate(df$sentences, cluster = model$cluster)

But obviously this doesn’t work, because as R says, "no applicable method for ‘mutate’ applied to an object of class "character".

Any ideas?

>Solution :

Without data to test it, if I got it right, sentences is a list of strings, which you can use to create a column in a new dataframe, and model$cluster is an array where every position/index is related to the same one from the input. So, if the order of the list was kept, they are related. If this is true (I don’t know because I never used tm library) you can just create a new dataframe with the list and the array.

kmeans_results = data.frame(
  sentence = sentences,
  clusterID = model$cluster,
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading