# Only My Notes

### Hierarchical Clustering

Calculate distance matrix (euclidean distance as default):

``````dist(tab)
``````

Use first 3 columns:

``````dist(tab[,1:3])
``````

Hierarchical clustering with euclidean distance and average linkage:

``````hc <- hclust(distance_matrix)
``````

Print the dendrogram:

``````plot(as.dendrogram(hc))
``````

Dendrogram function with different graphic options:

``````library(ggdendro)
ggdendrogram(hc, theme_dendro = FALSE)
``````

Another dendrogram option:

``````library(devtools)
myplclust(hc, lab.col = unclass(tab\$col))  #lab.col to color based on column value
abline(h=1.5,col="red")   #dendrogram cut
``````

### K-Means Clustering

k-means non-hierarchical clustering with two groups:

``````kmeans(tab,centers=2)
``````

Exclude columns 11 and 12 and divide into 5 groups:

``````Clust <- kmeans(tab[,-c(11:12)], centers=5)
``````

Built a table having every different version of col for columns and the different groups as rows, to see how the col value are distributed inside the groups:

``````table(Clust\$cluster, tab\$col)
``````

Other useful parameters:

• iter.max: max iteration number before stop.
• nstart: different number of centroids to stard. nstart = 100 means 100 different test with different centroids, then it can choose the best.