3.2 – Multi-class Classification

In the logistic regression post, we talked about only a binary classifier–one that works if your data has only two classes. In reality, this may not be the case, and you may have more data and you might like to classify all of them. We now detail how to use logistic regression to do this. If you understood the logistic regression post, this should be easy. Another algorithm, called softmax regression, is a generalized version of logistic regression, and can also be used to perform multi-class classification. This post is very light on the math. The majority of the math is in understanding the core logistic regression algorithm.

Similar to the logistic regression post, I’ll generate a dataset with 3 classes, and show with code, how classification is done. Multi-class classification using logistic regression is called a one-vs-all or a one-vs-rest implementation, and we’ll see why shortly.

Let’s start by importing packages:


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs

sns.set(style='white')

and then creating the data:


data = make_blobs(n_samples=150, centers=3)

Let’s get the X and Y components separately:


X = data[0]
Y = data[1]

And visualize the 3-class data:


plt.scatter(X.T[0], X.T[1], c=Y, cmap='plasma');

I used the last two parameters to get reasonable colors for each class. Here’s the output in my system:

4

Three nice, separated blobs (it did take a few tries to get them this cleanly separated, but oh, well). Now to actually perform the classification. Since we have, here, three classes, we create three logistic regression models. Each model will be a binary classifier–the same as we learned in the logistic regression post–and each will recognize only one class. For example, the first model will learn to recognize Y=0, and treat the other two classes as one “other” class. And this is why we call it the one-vs-rest model. Each classifier classifies one class versus the other classes.

Rather than re-implement the logistic regression model like last time, we’ll use a built-in model to demonstrate. Although the one provided by scikit-learn is capable of multi-class classification, we’ll be careful to make sure we use only the two-class classifier. Like I said, we need three such models. Let’s first import the class.


from sklearn.linear_model import LogisticRegression

Because we’re trying to demonstrate a one-vs-rest implementation, we need three Y arrays, one for each of the classifiers. These will make sure each classifier can identify one class.


y1 = [0 if x == 0 else 1 for x in Y]
y2 = [0 if x == 1 else 1 for x in Y]
y3 = [0 if x == 2 else 1 for x in Y]

We used three simple list comprehensions to do this. Now let’s create the model objects.


model1 = LogisticRegression().fit(X, y1)
model2 = LogisticRegression().fit(X, y2)
model3 = LogisticRegression().fit(X, y3)

To plot the decision boundaries, we need to create a grid, just like last time:


xx, yy = np.mgrid[-10:10:.01, -10:10:.01]
grid = np.c_[xx.ravel(), yy.ravel()]

Remember: the -10 and 10 values in the first line come from the fact that all my data is within those bounds. You may have to change these values. Next, we simply tell each classifier to predict probabilities on the grid that we created.


probs1 = model1.predict_proba(grid)[:, 1].reshape(xx.shape)
probs2 = model2.predict_proba(grid)[:, 1].reshape(xx.shape)
probs3 = model3.predict_proba(grid)[:, 1].reshape(xx.shape)

And we’re done! All that’s left to do is to make sure the three models have correctly done what we expected. Easy enough, we just plot the decision boundaries.


f, ax = plt.subplots(figsize=(8, 6))
ax.contour(xx, yy, probs1, levels=[.5], cmap="Greys", vmin=0, vmax=.6);

ax.scatter(X.T[0], X.T[1], c=Y, s=50,
cmap="RdBu", vmin=-.2, vmax=1.2,
edgecolor="white", linewidth=1);

ax.set(aspect="equal",
xlim=(-10, 10), ylim=(-10, 10),
xlabel="$X_1$", ylabel="$X_2$");

Change line 2 to probs2 and probs3 to get the other two plots. Here are the 3 decision boundaries in my system.

 

Absolutely perfect. The decision boundaries cleanly separate each class from the rest of the classes. This is exactly what we wanted. We can also use the built-in library to perform a one-versus-rest classification as below. First, create the model, and specifically ask it to use the one-vs-rest model (that’s the ovr parameter).


model4 = LogisticRegression(multi_class='ovr').fit(X, Y)

Next, find the probabilities on the grid:


probs4 = model4.predict(grid).reshape(xx.shape)

And finally, plot the decision boundary. This code is from the documentation.


f, ax = plt.subplots(figsize=(8, 6))
ax.contourf(xx, yy, probs4, cmap=plt.cm.Paired);

ax.scatter(X.T[0], X.T[1], c=Y, s=50,
cmap="RdBu", vmin=-.2, vmax=1.2,
edgecolor="white", linewidth=1);

ax.set(aspect="equal",
xlim=(-10, 10), ylim=(-10, 10),
xlabel="$X_1$", ylabel="$X_2$");

Here’s the output in my system:

8

Let’s look at how we use the three models to find which class it belongs to, using the one-vs-rest scheme. Recall from the logistic regression post that each model outputs a probability. In that post, we considered the class as 1 if the probability was greater than or equal to 0.5. Here, we don’t do that, and leave the probabilities as-is. All we need to do is output the class corresponding to the model that outputs the highest probability value. For example, if classifier 1 (for class 1, say) outputs 0.4, classifier 2 (for class 2) outputs 0.53, and classifier 3 (for class 3) outputs 0.48, we predict that the particular data point is in class 2. It’s really that simple.

How do we do this in Python? Easy: rather than finding the max value and then finding which class it belongs to, we use the built-in argmax function in the numpy library, which does the job. Here’s a sample output (you’ll probably get a different output).


# Predict for the test example, x1 = 0, x2 = -5
# Prints 0: we defined model3 to output 0 if the class is 2.
print(model3.predict([[0, -5]]))

# Shows the actual probabilities for each class. Remember this
# is a one-vs-rest implementation, so class 0 here is class 2
# and class 1 here is "not class 2"
print(model3.predict_proba([[0, -5]]))

# Prints the same output as the first
print(np.argmax(model3.predict_proba([[0, -5]])))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s