Recurrent neural network example using image dataset RNN

Table of Contents

  1. Recurrent neural network example
  2. Video Explanation

1. Recurrent neural network example

This topic is based on implementing the recurrent neural network on an image dataset.

  1. So here we are going to use the MNIST dataset, that we have used to demonstrate the value. Here we going to show the fact that RNN can also be used to learn features from the image datasets. So the format present in the dataset is a bit different so let’s begin with the coding part:
  2. To implement the recurrent neural network in Pytorch you have to import torch, torch.nn, torchvision.transform and torchvision.datasets is to import the dataset to our system.
  3. Then we are calling dataset.MNIST to download the training dataset and testing dataset.
  4. As you know from the previous content, the training dataset contains 60,000 images and test datasets of 10,000 images and each image has a size of 28 x 28. This information can be used further while implementing the recurrent neural network using image datasets.
  5. The batch size is 100, the number of iteration is 3000. The concept here is to iterate the whole dataset.
  6. In 3000 iterations we define the number of epochs as shown in figure 1.
  7. The whole training dataset is being divided by the batch size that will be 600.
  8. If the total number of iteration is 3000 so the number of epochs will be 5.
  9. So here we are covering the whole dataset 5 times in a series of 3000 iterations. Here we have also called the train and test loaders.
Figure-1

Lets see how to build the basic RNN module:

  1. To build an RNN, we have given it a name as RNN model and then inherit nn.module and it has all the properties of deep learning in Pytorch.
  2. Inside the init function we have called the input dimensions, hidden dimensions, layer dimensions, and output dimensions.
  3. Then we call the constructor function which is basically the super function, which calls itself.
  4. Then we specify the layer dimension, hidden dimension, and finally the nn.RNN method in Pytorch. It has all the components which we have called in the init function.
Figure-2
  1. The batch first should be always true, make it true.
  2. Then the non-linearity, which is basically the activation function that we are using in RNN.
  3. Here in figure 2, you can see that we have mentioned as tanh. Tanh is one of the very good activation function, instead of tanh, we can also use Relu function, that means rectified Linear unit.
  4. Then the last layer is the linear layer which includes the method of hidden dimension and the output dimensions. As we know have 10 output dimensions as we have 10 classes.
  5. In the forward pass, we have mentioned before that we feed the hidden layer from the previous RNN module to the next RNN module
  6. In figure 3, you can see the h0 which is the RNN input hidden layer units are set to 0,  and I have called torch.zeros, then it has a layer dimension and then hidden dimension.
  7. Then now we are initializing the hidden layers to zeros and then in the figure you can see out, hn is helpful for back-propagation. As we have seen before we back propagate RNN as we have to back-propagate for every RNN module in the whole RNN system.
Figure-3
  1. Each of the RNN modules stands for one time instances, so that means it is back-propagation over time.
  2. When we perform back propagation over time, there is a tendency for our gradients to either explode or vanish. To prevent that we use h0.detach(). and the whole function will take care of the output. And the input to the next layer, which is also covered here.
  3. Then we can see that in figure 5, the input dimension is 28, the hidden dimension is 100 and the layer dimension is 1 and then we have the output dimension is 10.
  4. Then we define the CUDA device.
  5. Then the loss function is a cross-entropy loss.
  6. Then the learning rate is 0.1.
  7. The optimizer used is the SGD which will be sufficient in this case.
  8. The sequence dimension is 28 as we have mentioned earlier the input dimension is 28, you can think of the sequence dimension as each time instance, we are inserting one row of 28 x 28 images. So by doing it we can cover 748 pixels. At each time instance, we are inserting 28 pixels of only one image and that is being done for 400 images.
  9. After this, we train our model which we ran and we can see that the final accuracy is 85 percent which is really high.

Video Explanation

Leave a Reply

Your email address will not be published. Required fields are marked *