Table of Contents
- Recurrent Neural Networks
- Code Implementation
- Video Tutorial
1 . Recurrent Neural Networks
- We have already discussed two major variants of neural networks in the previous blogs. In this blog, we are going to discuss on the final type of neural network, Recurrent Neural Networks.
- Recurrent neural networks or RNN is a very important variant of a neural network, heavily used in natural language processing in a general neural network, and input is processed through a number of layers and an output is produced with an assumption that two successive inputs are independent of each other.
- This assumption is, however, not true in real life scenarios. For example, if one wants to predict the price of a stock at a given time, we have to consider the dependence on previous observations.
- We can imagine RNN as a multilayer neural network with each layer representing the observations at a certain time. So RNN uses an additional loop of waves. Here, we will use a special variant of RNN, that is RNN, LSTM which is able to look back longer, than normal.
- RNN has numerous applications including speech recognition, language modeling, translation, and image captioning.
In this section, we are going to discuss the architecture of RNN .we already know that our email contains a loop of weights.
If we unroll this, it will be presented by a sequence of networks. This exactly is the input at a time. ‘St’ is the hidden state at TimeStep‘t’. It is calculated based on the previous hidden state and the input at the current state. we can represent this by a f(U*Xt+W*St-1).
The function F usually is a non-linearity such as Relu. Finally, this Ut is the output at step T. We use a softmax function on the input of this layer so the output will be softmax function with V into the output of the hidden layer. It will follow the same rule for every timestamp in our demonstration we use IMDB data set.
2. Code Implementation
- Now we run this on Jupiter Notebook and work with a complete sentimental analysis using LSTM model.
- First, we import sequential model API from keras. In this demonstration, we are going to use Dense, LSTM, and embedding layers. Then we have to import them from keras.
- The data we use here is the dataset created by researchers at Stanford University and published in a paper in 2011. It contains 50000 movie reviews from IMDB users that labeled as either positive or negative. The original database contains all the reviews from IMDB users.
- The reviews are preprocessed and each one is encoded as a sequence of word indexes in the form of Integer. The words within the reviews are indexed by their overall frequency within the dataset.
- In today’s blog, we will build an LSTM model train that using a training data set and finally predict the output from test inputs.
- We import sequence preprocessing which will help us to pre-process the training and test data.
Data helps us to split the dataset into training and test set. The print helps us to display the shape of the inputs and outputs of our training and test data set. We combine the input, size data, and outputs as targets.
We have two output classes and we are working with 9998 unique words. The average review length is 234.758 with a standard deviation of 173.
This code snippet displays the inputs and its associated outcome Associated.
In this snippet of code, we have printed the complete first movie review in the dictionary. All the frequent words are stored with their index value.
- Next, we need to truncate and fed the input sequences so that they will be in the same lane for modeling. So indeed, the sequences are not of the same length in terms of content, but the same length vectors are required to perform the computation in Keras.
- Now we construct our model. The first layer is an embedded layer that uses 32 length vectors to represent each word. As this is the first layer, we have to specify the input dimension, which is 500. The next layer is LSTM layer with 100 memory units.
As this is a classification problem, we will use a sigmoid activation function so that the output will be in the range of 0 and 1. This summary function is generating the model summary.
- In this section, we will discuss the model training for this training. We have already used the epoch as a type, so there were 10 iterations.
- We have already compiled our model with the loss function as ‘binary_cross_entropy’ and optimizer as ‘Adam’ and Matrics as ‘accuracy’.
- We gave the batch size as ‘64’ as our input dataset contains 25000 data. So there were 391 batches of 64 samples each. We have already trained this model giving X_train and y_train as an input and output for the model and X_test and y_test as the validation set of the model.
The model has loss of 0.65 and accuracy of 85.15.