Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Getting the distance matrix back from already clustered data

I have used hclust in the TSclust package to do agglomerative hierarchical clustering. My question is, Can I get the dissimlarity (distance) matrix back from hclust? I wanted the values of the distance to rank which variable is closer to a single variable in the group of variables.

example: If (x1,x2, x3,x4,x5,x6,x7,x8,x9,x10) are the variables used to form the distance matrix, then what I wanted is the distance between x3 and the rest of variables (x3x1,x3x2,x3x4,x3x5, and so on). Can we do that? Here is the code and reproducible data.

Data:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

structure(list(x1 = c(186.41, 100.18, 12.3, 14.38, 25.97, 0.06, 
0, 6.17, 244.06, 19.26, 256.18, 255.69, 121.88, 75, 121.45, 11.34, 
34.68, 3.09, 34.3, 26.13, 111.31), x2 = c(327.2, 8.05, 4.23, 
6.7, 3.12, 1.91, 37.03, 39.17, 140.06, 83.72, 263.29, 261.22, 
202.48, 23.27, 2.87, 7.17, 14.48, 3.41, 5.95, 70.56, 91.58), 
    x3 = c(220.18, 126.14, 98.59, 8.56, 0.5, 0.9, 17.45, 191.1, 
    164.64, 224.36, 262.86, 237.75, 254.88, 42.05, 9.12, 0.04, 
    12.22, 0.61, 61.86, 114.08, 78.94), x4 = c(90.74, 26.11, 
    47.86, 10.86, 3.74, 23.69, 61.79, 68.12, 87.92, 171.76, 260.98, 
    266.62, 96.27, 57.15, 78.89, 16.73, 6.59, 49.44, 57.21, 202.2, 
    67.17), x5 = c(134.09, 27.06, 7.44, 4.53, 17, 47.66, 95.96, 
    129.53, 40.23, 157.37, 172.61, 248.56, 160.84, 421.94, 109.93, 
    22.77, 2.11, 49.18, 64.13, 52.61, 180.87), x6 = c(173.17, 
    46.68, 6.54, 3.05, 0.35, 0.12, 5.09, 72.46, 58.19, 112.31, 
    233.77, 215.82, 100.63, 65.84, 2.69, 0.01, 3.63, 12.93, 66.55, 
    28, 61.74), x7 = c(157.22, 141.81, 19.98, 116.18, 16.55, 
    122.3, 62.67, 141.84, 78.3, 227.27, 340.22, 351.38, 147.73, 
    0.3, 56.12, 33.2, 5.51, 54.4, 82.98, 152.66, 218.26), x8 = c(274.08, 
    51.92, 54.86, 15.37, 0.31, 0.05, 36.3, 162.04, 171.78, 181.39, 
    310.73, 261.55, 237.99, 123.99, 1.92, 0.74, 0.23, 18.51, 
    7.68, 65.55, 171.33), x9 = c(262.71, 192.34, 2.75, 21.68, 
    1.69, 3.92, 0.09, 9.33, 120.36, 282.92, 236.7, 161.59, 255.44, 
    126.44, 7.63, 2.04, 1.02, 0.12, 5.87, 146.25, 134.11), x10 = c(82.71, 
    44.09, 1.52, 2.63, 4.38, 28.64, 168.43, 80.62, 20.36, 39.29, 
    302.31, 247.52, 165.73, 18.27, 2.67, 1.77, 23.13, 53.47, 
    53.14, 46.61, 86.29)), class = "data.frame", row.names = c(NA, 
-21L))

Code:

as.ts(cdata)
library(dplyr) # data wrangling
library(ggplot2) # grammar of graphics
library(ggdendro) # dendrograms
library(TSclust) # cluster time series

cluster analysis

dist_ts <- TSclust::diss(SERIES = t(cdata), METHOD = "INT.PER") # note the data frame must be transposed
hc <- stats::hclust(dist_ts, method="complete") # method can be also "average" or diana (for DIvisive ANAlysis Clustering)
hcdata <- ggdendro::dendro_data(hc)
names_order <- hcdata$labels$label
# Use the following to remove labels from dendogram so not doubling up - but good for checking hcdata$labels$label <- ""
hcdata%>%ggdendro::ggdendrogram(., rotate=FALSE, leaf_labels=FALSE)

>Solution :

I believe the object you are looking for is stored in the variable dist_ts:

dist_ts <- TSclust::diss(SERIES = t(cdata), METHOD = "INT.PER")
print(dist_ts)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading