12 – Introduction to Cluster Analysis

Let’s now talk about unsupervised learning for a while–a class of problems where we’re not given the target labels with the data. Essentially, we only have the X as our training examples. The only thing we can really do is find some patterns in the data.

Cluster analysis, or clustering, is one such class of methods. In clustering, we try to find reasonable “clusters” of data–data that is grouped together in some meaningful way. What constitutes “meaningful” decides what clustering algorithm we use. There are many algorithms that do clustering in different ways, and the scikit-learn website demonstrates several examples of different algorithms run on different data. The below image is taken from their documentation.

sphx_glr_plot_cluster_comparison_0011

The comparison above shows you the uniqueness of each algorithm. We will certainly not discuss all of them, but only a few. We will also discuss how to check the goodness of clustering.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s