My Coding >
Software >
R >
Cluster analysis >
Hierarchical clustering in R
Hierarchical clustering in RAnother approach to clustering is to build hierarchy of all possible distances between objects in the dataset. This plot called Cluster Dendrogram. One of the advantages of these hierarchical tries (Dendogram) that it is very easy to analyse them visually without mathematical calculations and understand how different classes are appear in the set. Calculating hierarchical treeUsing all iris dataWe will use all available data for clustering by Hierarchical trees (hclust()).
As a result we will have the following dendogram:
This hierarchical tree reveals, that we have two separated group, but not three. Lets check it in more details. We will select 3 clusters from this tree and check how it is related with our species. cutree() will split our hierarchical tree into 2 groups and then table() will show relation with or real species
setosa was separated clearly. versicolor is clearly separated as well, but almost all virginica was classified as versicolor. Clearly this is very bad clustering Let’s check it on the plot
It is possible to see how green dots are cover almost all big area, which is wrong. So really, our clustering is fail on this example. Using Petal.Length and Petal.Width iris dataNow we will do hierarchical clustering on the basis of only Petal.Length and Petal.Width sets in our iris dataset.
This hierarchy tree is more clearly shows the splitting into 3 groups and this is very promising. Now we will check it with cutree() by comparing our cauterization with original species
As it is possible to see now, the separation between three groups is almost ideal. There are some confusing between versicolor and virginica species, but this is much better than our previous solution. Let’s check it visually by plotting new clustering and original datasets
And for reference we can compare with origina species distribution:
Hierarchy clustering conclusionThis example shows that it is very important to be very accurate with selected data for analysis. Sometimes additional information which give identical results for different groups can give very bad addition to the final clusterization.
|
Last 10 artitles
9 popular artitles
|
|||||||||||
© 2020 MyCoding.uk -My blog about coding and further learning. This blog was writen with pure Perl and front-end output was performed with TemplateToolkit. |