12 – Introduction to Cluster Analysis

Let’s now talk about unsupervised learning for a while–a class of problems where we’re not given the target labels with the data. Essentially, we only have the $X$ as our training examples. The only thing we can really do is find some patterns in the data.

Cluster analysis, or clustering, is one such class of methods. In clustering, we try to find reasonable “clusters” of data–data that is grouped together in some meaningful way. What constitutes “meaningful” decides what clustering algorithm we use. There are many algorithms that do clustering in different ways, and the scikit-learn website demonstrates several examples of different algorithms run on different data. The below image is taken from their documentation.

The comparison above shows you the uniqueness of each algorithm. We will certainly not discuss all of them, but only a few. We will also discuss how to check the goodness of clustering.