How to implement Logistic regression using pytorch

For implementing logistic Regression we have to import torch, torch.nn, torchvision.transform.functional as TF, torch.autograd to import the variables, numpy and pandas as pd, it is mentioned in figure 1.

Figure 1: Importing the values

Here are going to use different datasets, you can use this below mentioned in figure 2, to download all the datasets and use it. It is called Pima Indian diabetes datasets.

Figure 2: Link for download the dataset

So here in the Google colab , I am using the CSV file in this project.

First read the CSV file and we read the file, as you can see in figure 3, it will look like it.

Figure 3: reading the CSV file

As you can see from the figure 3, the column names has been edited to 0 to 8. If you want to know the column names and its description, check figure 4.

Figure 4: Column names and description

Logistic regression is a supervised machine learning algorithm for classification. Lets work on some of the datasets.

  1. First we have to check how many data points they have and it says the shape of the data frame use the {768, 9} and of-course the 9 will be for number of columns.
  2. Then we create the x_train and y_train, where we have 0-7 predictive variables and 8 is the classification in the column section.
  3. Then similarly we create x_test and y_test, where x has 0-7 columns and y has basically the 8th column as mentioned in figure 5.
Figure 5:  column classification
  1. The next step we are doing is converting out data-points to tensors, Here we are converting the x_train to float and x_test to float and to check whether the levels are basically the integer numbers and we will be keeping them as long.
  2. Make sure you use the right dataset for converting the integers. Here is the sample in Figure 6.
Figure 6 : Conversion of datasets

In the figure 7, this is how x_test looks

Figure 7: x_test conversion

Remember to reshape the y_train and y_test so the shapes will be like shown in figure 8

Lets define the Logistic regression model class.

  1. Firstly give a logistic name, we have given, Logisticregressionmodel (), make sure have to inherit the nn.Module inside it. Then define the init function, inside the init function you have to call the super function. As you have noticed in this project we have 8 predictive variables and two possible outcomes which are keeping the input dimension as 8 and output dimension as 2 as you can see in figure 9. So self,linear = nn.Linear which has input dimension and output dimension. As logistic regression doesn’t need any activation stage and we are not using any activation function, it only has the linear function being implemented on the x values.
Figure 9: defining logistic regression
  1. The next thing we have to do is instantiate the logistic regression model class with proper variables as arguments passed into it.
  2. Then you have to mention the input and output dimensions. Then we are passing our model to CUDA so that we can use the GPU.
  3. Then we are using CrossEntropyLoss() function, is this a kind of loss function which is basically a softmax function to assign probabilities to each of the possible cases and we will take the case which has a maximum probability. Then we are choosing 1000 epoch. Then set the learning rate is 0.001. Then the optimizer is SGD.
  4. After this process, we are ready to train our model.
  5. Inside the training, we are using CUDA as training inputs and training levels into train_input and true_train. Optimizer.zeros_grad, optimizes and initializes the gradients at each pass.
  6. Then the training output will basically be predicted levels are on the training datasets. After that, we will calculate our loss and loss. backward basically calculates the back-propagation and optimizer.step() updates the widths and biases.
  7. After each 50 epochs, we are checking how our models are working on our test set. So inside the test set we doing test_input = variable, x_test_cuda which is basically converting our test input into variables. The outputs are predicted on test input. The predicted values are gathered from torch.max which basically takes maximum probability, the level that has been maximum probability.
  8. Then we calculate how many variables are available and the total number of data points in the datasets. Which is 231 in our case and then we check how many of them are correct and from that, we will calculate our accuracy as the percentage of correct by total.
  9. Then we print our total accuracies and total losses, when we plot them, it will look like as shown in figure 10. so the accuracies are around 65 % which is not bad.
Figure 10: accuracies and losses of logistic regression model

Video Explanation

Leave a Reply

Your email address will not be published. Required fields are marked *