My Coding > Software > R > Cluster analysis > K-means clustering in R

K-means clustering in R

K-means clustering is a process of automatic classification of N observations into K groups or clusters. We will tray to apply this process for the build in iris dataset. We already assume, that we’ve already check and clean this dataset, and also we’ve made it anonymous, i.e. without species name.

Calculating K-means cluster

For easier job with datasets we use dplyr library. And at the next step we create dataset without labels for clustering


> library(dplyr)
> unlabeled_iris <- iris %>% select(-Species)

Now we can cluster it. We know in advance, that we need to split it into 3 groups, or 3 centres (kmeans()). And after splitting it – display it with the labels from original dataset (table()).


> iris_cluster <- kmeans(unlabeled_iris, centers=3)
> table(iris_cluster$cluster, iris$Species)

    setosa versicolor virginica
  1      0         48        14
  2     50          0         0
  3      0          2        36

As we can see, setosa is clearly separated from other data, but between versicolor and virginica there is some overlapping. Let’s check it graphically.

Graphical analysis of K-means clustering

For better understanding of our problem it is better to do visual inspection of our results.

This is a code to display our cluster coloured splitting into 3 groups


> plot(unlabeled_iris$Petal.Length, unlabeled_iris$Petal.Width,
       pch=c(22,21,23)[iris_cluster$cluster],
       bg=c("red", "green", "blue")[iris_cluster$cluster])

and this is splitting into 3 groups on the basis of they species


> plot(iris$Petal.Length, unlabeled_iris$Petal.Width,
       pch=c(22,21,23)[iris$Species],
       bg=c("red", "green", "blue")[iris$Species])

and compare the results:

Iris dataset k-clustering
Iris dataset k-clustering
K-clustering of Iris dataset
Original image: 565 x 635
Iris dataset species
Iris dataset species
Species in the iris dataset
Original image: 565 x 635

It is possible to see overlapping in the area of Petal.Length around 5.0. And it is impossible to find any automatic criteria to separate these data.


Published: 2021-11-18 02:42:27
Updated: 2021-11-18 02:43:40

Last 10 artitles


9 popular artitles

© 2020 MyCoding.uk -My blog about coding and further learning. This blog was writen with pure Perl and front-end output was performed with TemplateToolkit.