Table of Contents
- Auto Encoders
- Code Example
- Video Tutorial
- Auto Encoders
- In this blog on Deep Learning, we are going to construct an autoencoder using a convolutional neural network(CNN).
- An Auto Encoder is a technique to encode something automatically by using a neural network, and also able to learn how to decompose data into fairly small bits of data and then using that representation to reconstruct the original data as closely as it can to the original.
- We have two major components in autoencoders. They are,
- ENCODER: It learns how to compress the original input into a small encoding.
- DECODER: Decoder learns how to restore the original data from that encoding.
This diagram represents auto encoder in a simple way
The original input will pass through an encoder and encoder will compress it into small encoding. Decoder will reconstruct the original image from the compressed, encoded image.
- These compression and decompression functions are
- Data specific
- Lossy and
- Learned automatically.
- When autoencoders are said to be Data specific, they will only be able to compress data similar to what they have trained on.
- When said Lossy, the decomposed output will be degraded compared to the original inputs.
- And when said that they are learned automatically, It means that it is easy to train in specialized instances of the algorithm that will perform well on a specific type of input. It does not require any new engineering, just need appropriate training data.
In today’s demonstration, we are going to use Keras built-in data set. This MNIST data set is one of the most common data set used for image classification and accessible from many different sources. MNIST contains 60000 training images and ten thousand best images in the autoencoder.
We only work with input data means only the images. First, our images will be encoded by Encoder and then Decoder will reconstruct that original image. In the next section, we will explain the autoencoder model for MNIST data set. Every autoencoder has two components encoder and Decoder.
- We will perform encoding in this left section and reconstruct the input using this decoder which is on this right side. The dimension of our input in this dataset is a 28*28 with one RGV channel. The input layer starts with dimension 28*28*1. First convolution to deliver convert this into 28*28*16 as it contains sixteen filters. Then Max pull to the makes this 14*14*16. After that, another max pull converts this dimension to 7*7*8. This final conversion to the end Max pulling to the layers will change the dimensions to 4*4*8. So in encoder 28*28 inputs are compressed into 4*4.
- Now decoder section, which starts with the convert to deliver the output of this layer, is a 4*4 this upsampling to the will to compress the image into 8*8. After that are conv2d upsampling to repair will convert the dimension to 16*16 and final conv2d and upsampling to the layers will change it to 18*18. This final count today will reconstruct the image input with the same input dimension 28*28.
Now we will go to Jupiter and construct an auto encoder.
- First, we import the functional model API from keras. In this demonstration, we will use input Backend to the max polling today and Upsampling to the layers and we have to import them from keras. We also import data set contains a list and we will import this. Finally, numpy and matplotlib will help us to manage to plot our data and visualize them.
We load this dataset using the load underscored data to function, we only work with input training and test data. This is type function will convert that data type to float 32 and we normalize the rows by dividing it by 255. Now we reshape our data using this reshape function and it converts the input from 28*28 to 28*28*1.
This ‘1’ will be considered as the RGV channel. Finally, the print function will display the training and test data shapes. Both of them are in the same shape of 28*28*1. Now we construct our model. We start with the input layer with the shape 28*28*1.
The encoding section starts which starts with a conv2d layer with 16 filters of size 3*3. We set Relu as activation function and padding as ‘same’.The input layer will be considered as the input of this conv2d layer.
The Output of this layer will be 28*28*16. This max pooling 2D with window size 2*2 will convert the dimension to 14*14*16. Then we have a set of golf to the end, max pulling 2d and they will change that image dimension to 7*7*16. This final pair will convert the image dimension to 4*4*8. So this encoder part compresses the image from 28*28*4*4.
Now in decoding section,
It also starts with a conv2d taking that encoded output as the output of this layer will be 4*4*8. This upsampling2d will decompress the input. After this layer, the image dimension will be 8*8*8 . Decoded2D and Upsampling2D will convert that dimension to 16*16*8. This final pair will convert that dimension to 28*28*16. And this final count2d will reconstruct the image to the original dimension 28*28*1. In this final layer, we use a sigmoid activation function to create the final autoencoder. We use a functional model with the input and the final decoder output.
Now we recompile our model with the loss function binary cross-entropy optimizer as Adam and matrices as accuracy.
Let’s execute this plot. It is the time to train our model.
we use epochs as 100, so there will be 100 iterations. The more epochs, the more the accuracy, and these the laws. We set the batch size as 64, our input sample size is 60000. So there will be 938 batches of 64 samples each. We are taking the training set as input and the test set as validation. It started with the last point 1879. With the advancement of the epochs, it gradually decreased.
Finally, after 100 iterations, we got a loss around 0.09 with the accuracy more than 80 percent.