Table of Contents
- Convolutional Neural Network
- What does a CNN do
1. Convolutional Neural Network
The topic we are going to discuss about the concepts of Convolutional Neural network.
- Convolutional is the method of mathematics so that means it is almost like the multiplication of two functions, but it is slightly different from the concept of neural network.
- So Convolutional is superimposing two functions. The word we use CNN is because of computer vision, as human beings, we can see things and detect objects, and count the number of objects, also identify the objects in an image. But you might be wondering how a computer has the vision to see things, that is the study of computer vision.
- After the invention of CNN in 2012, computer vision has been gifted with a very huge amount of accuracy using CNN.
- Let’s consider the images of cats and see whether as human beings we can identify those as you can see in figure 1.
- As you can see the images of cats and you have also noticed that each cat is different from one another from the perspective of breed, color, background and you can also notice that in figure 1, that the last image of the cat is shown with half face, as a human being we all can identify that it is a cat.
- But for a computer it is really hard to understand the images since the colors are different, backgrounds are different, the shape sizes are also different, and also the angle of the photographs are different.
- So to understand a computer whether it is a cat or not, it needs to first identify what is exactly a cat looks like, so the solution is a convolutional neural network.
How to build a CNN?
- It is very simple to build a CNN, for the computer to identify the image.
- First consider an image, as you can see in figure 2, we take an image of a bird.
- Then you have to build a convolutional layer and then you have to build a pooling layer. Then we have to repeat these two mentioned layers many times.
- In the end, we usually have a fully connected layer as you can see in figure 2, which is the multi-layer perceptron.
- Then in the last stage of the multilayer perceptron, we have a classification function if its N-nary, we can use logistic if it is more than two or three classes softmax function instead of the simple sigmoid function. To build a last layer of CNN.
As we have mentioned before, there are four steps to build a CNN
- Convolutional Layer
- Pooling layer
- Fully connected layer
- N-nary classification layer, that means if it is 2 values it’s binary if it’s 3 values it trinary, and so on.
2. What does a CNN do?
There are three things to learn from an image:
- First, we have to learn the edges
- Then we have to learn colours. If the image contains colors, then we have to understand the RGB channels of the images.
- We have to also learn the curves of the object of the images as we can identify those objects anywhere and in different positions.
All in all we consider this as feature learning.
Now lets consider the image of a cat here:
- As you can see in figure 3, the image has been broken down in small grids.
- We can assign each grid as one pixel. So as you can see there are 10 boxes horizontally, and 8 boxes vertically, so it’s an 8 x 10 image.
- As you can see in figure 3, we have a 3 x 3 filter in the top left corner of the figure. From that, we will build the convolutional layer.
- So we will take the filter and scan the image, using that filter, so as you can see in figure 4, we will take the first 9 boxes and then keep moving the filter over the image.
- Once the scanning is done, we will learn the features.
- As you can see in figure 5, we will learn features.
- So like this way as shown in figure 5, we can learn the features.
To understand the features of an image using CNN methods we will discuss some basic mathematical methods.
- As you can see in figure 5, the pixel instant intensities in the first 9 pixels as 200, 200, 100, etc. Just like that, every box has different intensities.
- And as you can see the values of the filter
Then we superimpose the filter on the top of the image as shown in figure 6, as we have discussed above that the convolutional is a superimposition of two functions, one image is the intensities shown in the image and another one is the filter.
So when we superimpose those two, we basically multiply them as you can see the figure 7, and eventually we have the result as shown in the figure below.
After the multiplication process, you have to build the pooling layer, so the pooling layer is of two types, Max pooling and so max pooling is basically is shown in figure 8, which is the whole matrix.
- So as you can see from the image, the maximum value is 250.
- Now another kind of pooling is mean pooling, which means it is the mean of all the elements in the matrix as shown in figure 9.
- There are also different other kinds of pooling as well, but we will only use simple methods to explain the CNN methods.
- We repeat these steps many times and we will keep the number of filters as many as we want. This how we build the convolutional and pooling layer in CNN.
- After this process, we will feed everything to MLP which is then passed into a sigmoid activation layer or softmax activation layer which gives us the probabilities of what the image belongs to. As you can see in figure 2, there is a probability of bird, dog, etc.