Random forest classifier Scikit learn tutorial

Table of Contents

  1. Introduction
  2. Random forest classifier
  3. Example
  4. Video Tutorial

1 Introduction

In this blog, we will be discussing one of the most famous classifications and one of the best classification algorithm that is the random forest classifier

2 Random Forest Classifier

Decision tree divides the decision of all the categories of the data into specific values. For each instance dividing it like a tree structure with two binary formats always and coming to a specific result. And what happens in random forest classification is that it takes a specific number of decision trees which has been applied to the same data and takes the results of all trees and gives you the combined result.

3 Example

I’ll be importing the necessary libraries.


we are using the classic iris dataset in this. we have used the iris dataset in a couple of other classification algorithms also we are comparing the different results that we can get on the same data. So, let’s load the iris dataset from the data set subclass in scikit-learn.


Iris data is the dataset basically classified into three different types of flowers on the basis of the sepal length, sepal width, petal length, and petal width. We take the x data as features and Y as iris target.


Now we perform a test train split. we call the train_test_split function and pass the x and y values, taking 20% as the test size from the complete data set, and the remaining 80% of data as train data, we take the random state of 42. This number can be anything but taking the same number at it each time helps us to get rid of the bias of the data that can be updated.


Now, we need to standardize the value or the data which is present allows the model to have better accuracy and prediction values.


Now once your data is ready it’s time to call the model and implement or train the model. In this, we are calling the random forest classifier from sklearn_ensemble


No. of trees you can see are the n estimators. It can be one or even a hundred or ten. For now, the criterion that we are passing is the entropy. The random state we are passing here is zero. we fit the data to X_train and y_train.

Now,we obtain predicted values of Y label


So now time to test the predicted values. let’s compare actual and predicted values. That is done through a confusion matrix in classification data.

  1. we’ll be calling that from the subclass of sklearn_metrics and the confusion matrix now is 3*3 confusion matrix. As we know that the off-diagonal elements are incorrectly classified and diagonal elements are correctly classified. Here there are zero incorrectly classified points.
  2. we can obtain an accuracy of that simply by bypassing the y_test and y_predicted values into the accuracy_score function. we obtained an accuracy of 1.

This means that compared to the other algorithms that we have implemented in this series for classification random forest has given us the best result. It is the reason why it’s one of the most preferred classification algorithms on kaggle.

Video Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *