What we’ve discussed so far is really only the basics of neural networks (okay, maybe not just the basics, but there’s still a long way to go). At this point, you’re ready to learn different network architectures that are commonly used, after which you might go on and try designing your own! Again, we won’t go in depth here, but we will link to resources where you can learn more.
Convolutional neural networks
Convolutional neural networks (CNNs) are used for image processing–tasks like classification (“what is this an image of?”), segmentation (“which pixels in this image are of what object?”), and even captioning (“give a relevant caption to this image”). So far, we’ve seen dense layers (a regular layer with some nodes, fully connected to the previous and next layers), dropout, and batch normalization layers. CNNs use some additional layers–convolutional and pooling layers. Convolutional layers use a number of filters (matrices, usually about 3 x 3 or 5 x 5) at each layer. To perform a convolution, you start by overlaying the filter on the image, multiplying each overlapping square, and summing them up. This gives you one number, the first element of the first row of the result. Then, you move the filter across the image and repeat, giving you the rest of the result. This blog shows a nice animated visualization of the process. Pooling layers are used to reduce the size of the image. In such a layer, you take the top 2 x 2 square (for example) of your image, and pool it–you can take the max (called max-pooling) or average (called average pooling). Very frequently, you’ll see conv layers followed by pooling layers, and then batch norm and dropout layers.
Recurrent neural networks
The basic idea in recurrent neural networks is to use loops in the architecture (making it recurrent). There are several different ways you can arrange such cells, creating either regular recurrent neural networks, long short-term memory (LSTMs), or gated recurrent units (GRUs). LSTMs are the most popular among these, though they are a little complex.
Generative adversarial networks
Generative adversarial nets (GANs) are used to generate realistic images, say of faces, sceneries, or pretty much anything. The idea is to use two different neural networks–a generator that generates images, and a discriminator that is essentially a classifier, and reports either “true image” or “fake image”. These two are trained together, and are the adversaries of each other–the generator wants to “fool” the discriminator into thinking a fake generated image is a real one, and the discriminator wants to get better at spotting the fake ones. GANs typically take a long time to train, and one of the reasons is that initially, the generator starts off random, so the discriminator’s job is pretty easy–so they both kind of suck–and now you have a case of the blind leading the blind. This blog does a good job of detailing the different kinds of GANs that are in use. Dr. Saha’s notes detail the math behind GANs.
I’d recommend you start with the Deep Learning Specialization on Coursera, a series of five courses to get you up to speed on the subject. Don’t worry if it seems like a lot of work–we’ve covered a lot of material from the first two, so you can probably watch those two courses at 1.5x or maybe even 2x. The third course gives some practical advice about using neural networks. It’s not required, but it’s handy to learn the information that’s presented there. Course 4 discusses convolutional neural networks in detail, along with some common architectures, and course 5 discusses recurrent neural networks, GRUs, and LSTMs. Along the way you’ll also learn to use two popular frameworks, TensorFlow and Keras.
Next, you can watch the two courses by fast.ai, where the emphasis is on getting you started using neural network, while also showing you the inner workings and other neat tricks. Here, you’ll learn to use another framework called PyTorch (which, in my personal opinion, is the best, along with Keras, but I don’t want to bias you).