# What is Autograd Making back propagation easy Pytorch tutorial

2. 2. Back-propagation

2. Autograd is necessary for back-propagation and in Pytorch by making autograd makes back-propagation much easier. There are back-propagation and forward-propagation in neural network.
1. Back-propagation
2. Let’s say we have a simple neural network where we have only one neuron z, one input data which x, and x is a width of W and bias form of b. In this neuron, we have data in the form of z=W*x + b, so it is a straight linear equation as you can see in figure 1. Then pass that equation through an excitation state where we fit the value of z in the sigmoid function and we get the value of ŷ that is the predicted value for y. This process happens in forwarding pass, all this computation is shown in figure 1 which gives us an image of a graph. This the model of the computation graph.

After we compute ŷ we have to calculate Loss. The loss function can be of any form, here we are taking main square which is basically the mean of actual y minus ŷand its square. In figure 2 as you can see we haven’t shown the mean, it is just the square of the difference since we only have one value. But in real scenario you have to take the mean of all the samples. Once we have the loss function we are ready to calculate the back propagation of the neural network.

Now we will show what happens in back propagation stage, as you can see, we calculate the gradient of each step as you can see in figure 3.

1. First we will calculate the partial differentiation of L with respect to ŷ.
2. Then we calculate the partial differentiation of ŷ with respect to z.
3. Then we calculate the partial differentiation of z with respect to W.
1. The main reason to do the above mentioned calculation is because we need the partial differentiation of Loss with
respect to W, so that we can update on the next stage with our learning rate by the chain rule of partial differentiation, we can show that (δL / δW) = (δL / δŷ) * (δŷ / δz) * (δz / δW) as shown in figure 3. In the same way we calculate the partial differentiation with respect to b where we take all the three partial differentiation with respect to L, ŷ and W. Then we calculate the partial differentiation of Loss with respect to b.

By using δL / δW and δL / δb , we can see from the figure 4, this is the heart of the back-propagation stage we have to calculate the W and b with respect to the differentiation we do that using alpha ( α) which is the learning of the formula which you can in the below figure.

Lets focus into the coding part:

Pytorch make the calculation easy that you don’t have to add the equations , it will do it for you. Hence Pytorch is so popular, that Pytorch creates the computational graph apart from various other machine learning platforms. Which would create static computation graph, so Pytorch is much better. Lets look into the coding part:

1. First we import torch and then we create a variable as you can see the figure 6, which is float 30, we can assign it as x, float 40 in W and float 50 in b.
2. Here one thing you might have noticed that we don’t need gradients with respect to x. So In this case requires_grad=False because it differs to we didn’t mention it. In case of W and b we have requires_grad= True in both the cases because we need gradients with respect to W and b. Just to check whether it is a leaf node as shown in the figure 6 we are creating this code b.is_leaf it will be False . But when you check with W and x the value will be True.