Table of Contents
- What is Random Forest
- Using Random forest as Regressor
- Video Tutorial
1 What is a Random Forest?
- The idea behind the random forest algorithm is that it takes not just one computation as your final result but a combination of multiple values and then tries to calculate the best results from multiple values.
- The classic example which has been given from random forests is that for example you have a bunch of balls inside a very big balloon and since we are unable to see that you are accounted that you need to tell how many balls are there inside that best specific balloon
- Let say 20 people having their own specific values for example let’s say that some person says the minimum value of the balls is 10 and some say 100 now all of them the guesses the best guess is that mean of all the value specified that is the principle
2 Random forest regressor
- It uses a stepwise prediction on a specific level for its regression so let’s get started with the example.
- We have imported the necessary libraries numpy, matplotlib, and pandas Then reading the dataset through panda.read_csv
- From the data frame that we have selected x and y values are defined using the iloc function inside the pandas. we have to predict the salary for the specific positions.
- Now we have to find a specific value of how much it will be at 8.5 let see how we can do that with Sklearn random forest regressor.
3 Example in SKLEARN
Random forest regressor is available inside the sklearn library. We have to import it from sklearn.ensemble
- N estimators or the number of trees that we have considered is 50 now one can take any value from one till the limit through which the computation can be handled from a computer but 50 is good enough for a very small amount of data like that
- Random state simply defines for each time your regressor predicts a value it makes sure that the state remains the same by which the bias doesn’t progress
- I fit the value that I have for x and y I get this specific class I say that the random forest regression is running having this default values of the parameters and settings of the one that we have given to it now what we want to do is that
- we want to predict the specific salary of the year if it’s 8.5 which is not been provided so let’s see how well does random forest regression works
- you can see it gave you the results in a stepwise function which is increasing over a point of time and we can see that the red dots represent the correct value followed by the blue line representing the prediction curve that is being given from the algorithm now as we can see the specific line is crossing all of the means of the point which gives us a very good classification now what else we can do with about it now if we can change
Let experiment with different n_estimators:
- Let’s predict Y with the number of trees equals to 10.
- Let us have a look at the graph now for this time we can see clearly see this curve is different as compared to the other one as a step function increased
- Increase the number of estimators to 100 I think we’ll be having much more precise results for that as you can see the values.
- We have model that fit for each of the data which is being given to us as you can see the value of the distribution the red point is not in a linear curve which simply demonstrate that it’s not a linear distributed function in which we cannot use the linear values.
- The value that the CEO salary is holding is 70k which does not fit the line accurately