Table of Contents
- Datasets In Pytorch
- Simple classification task Datasets
- Datasets for other applications
- Object Classification / Detection/ Localization
- Caption generation
- Video Explanation
1. Datasets In Pytorch
- The datasets of Pytorch is basically, Image datasets. These image datasets cover all the Deep-learning problems in Pytorch.
- There are many interesting datasets and you can find more datasets on the following website dodge.vision.datasets. You can find a huge list of datasets on the mentioned website
2. Simple Classification Task Datasets
- MNIST datasets : It has 28 x 28 grayscale images, from 0 to 9. It has obviously 10 labels which are from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. There are 70,000 images in total in these datasets which includes 60,000 training images and 10,000 test images.
- Fashion-MNIST datasets : Fashion-MNIST is same as MNIST datasets. It has 28 x 28 grayscale images, from 0 to 9. It has obviously 10 labels which are from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. There are 70,000 images in total in this datasets.
- KMNIST Datasets : KMMIST datasets are also similar to to other MNIST datasets which has same 28 x 28 grayscale images, from 0 to 9. It has obviously 10 labels which are from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. There are 70,000 images in total in this datasets.
- EMNIST Datasets : They are similar to the other MNIST datasets.
3. Datasets for other Applications
Object Classification / Detection / Localization
- ImageNet Datasets :These datasets has different images which has 1000 object categories.
- CIFAR Dataset : These datasets have 60,000, 32 x 32 images which are in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images in this dataset.
- STL-10 Datasets : These datasets have 96 x 96 and 500 training and 800 test images per class with the total of ten classes.
These include COCO Caption datasets and SBU Captioned photos. These datasets have images and caption written below it. So we can use these datasets in the celestium models which are used in neural networks to generate captions for images.
Lets look into some of the simple datasets with examples:
- First we have import torch then we have import matplotlib to see the images
The we import MNIST from torchvision.datasets. In figure 1 were you can see how to import the above mentioned datasets.
- Lets see how this datasets look like:
- One thing to notice in figure 2 is that here we are putting the MNIST dataset in mnist_folder. You can name the folder as you want.
- If you keep train=True, it will download the failing part, as the testing part has 10,000 images and training part has 60,000 images as you can see from figure 2.
In figure 2, if download=True, that means it will first check there is a dataset which is already downloaded, if it not downloaded, it will get download the datasets.
- Here in figure 3 , you can see in mnist_folder, I have the dataset with the name MNIST.
- Lets see how the dataset look like:
- In figure 4, you can the dataset MNIST , which is under mnist_folder. The MNIST dataset includes a number of data points, root locations, and splits.
- Look at the first component in figure 5, the first data in the data set which is train. It has a zero index. As you can see further, it has a PIL (Python Image Library) image. PIL is the built-in manipulation system in Python, which is very handy to use. As you see the figure, it says the size is 28 x 28. As you can see the data-type is a couple, as it is closed inside a parenthesis and it is a couple of two elements. Since you can see the digit as 5, so the first image is 5.
Since it has two components , we can break it down like shown in the figure 6.
After breaking down the components, we get a separate image and tag components. You can see in figure 7, there is a small image of 5. In tag, there is a digit 5.
If you want to make the image bigger, we import the image library from PIL and we resize it to as 200 x 200 from 28 x 28. The image looks blurry than the original image as the original image size was 28 x 28.
Similar way you can perform the same steps for fashion_mnist and Kmnist.
- Once we are able to see the datasets, it is important that we can use machine learning on this dataset. But the computer in Pytorch only understands only tensors. So what we have to do here is to, we have to convert all the datasets, all the images into tensors so that we can implement deep learning methods to train our models on these images. To do that we have to import the packages which is shown in figure 9.
- Then we download the mnist dataset but here we have changes the command to transform=transform.ToTensor as you can see in figure 9. The main perspuse of using this command is that it initially that the dataset of the PIL image, there was a level which is 5, now we will be changing it to a tensor but we will be keeping the level as it s , which is 5 and we will add it to the tensor.
- Then we are using Dataloader , as you can see in figure 10. The main reason for using dataloader is when we are implementing deep neural network, we don’t want to have to load all the data all together in the main memory. It puts lot of pressure on your Android device or software. Here in figure 10, you can see we are only loading 256 images at a time and we train our models in these 256 images at one time. And we load the load the data by the size of batchsize.
And later on we download the data in the train_loader, in which we can load all the datasets and transform it into the tensor. After adding the image and label in the train_loader we can print it and you can see that images are in the form of the tensor. This tensor has 256 images as you can see from figure 11. These 256 images can load next 256 images at a time and show the same out as shown in the figure below and it will be shown in the memory and I will get trained on those images. After the training is done, we will be testing the models using the test loader