Table of Contents
- Decision tree Classifier
- Video Tutorial
In this blog, we are discussing another important algorithm in classification. That is Decision tree classifier. It is one of the fundamental ways in which the classification happens and it uses the concept of trees or the distribution according to the tree which gives us the result for the classification.
1 Decision tree Classifier
A Decision tree Classifier simply takes a number of decisions and then classify them according to the same.
For example, If I take that a person is fit and then I have specific features to say that he is fit or unfit. I’ll be passing several decisions and according to the decisions, I’ll be classifying. So, the two things giving it the name of decision tree classifier, the decisions of binary value either taking it as a positive or a negative and the distribution of decision and a tree format. It’s one of the most popular libraries used or classification.
Lets’ see how to implementdecision tree using scikit-learn.
I’ll be importing the necessary libraries. we have data on the social advertisement. We’ll be taking the values of x and y through the data frames. we have several features and the target variable of the purchase is based on the X features that is the user ID, gender, and age. The y, the target variable is the purchased one.
So, we need to predict whether the person will be buying a specific item or not. The shape of the data is 400 rows and 5 columns. we perform to train and test split. we give a distribution of 20% to the test and 80% distribution to training data. We take a random state of ‘0’. we scale our data. Now we call the function for decision tree implementation. Decision tree is in a subclass of scikit-learn
we gave criterion as entropy and random state=’0’. In this model, we are using entropy instead of the default genie because it provides much better functionality with the data.
Now we fit the data using the fit function which takes X_train and y_train values. As the data is fit to the model, y_pred can be obtained which gives the predicted values when test data is provided.
To see how many values are correctly predicted we use a confusion matrix. We call the Confusion matrix function. The confusion matrix gives you the correctness of the model also gives you the instance of whether your model is having biased.
In confusion matrix, the off diagonal values gives the incorrectly predicted number
Here, values that are incorrectly predicted are less. So, we can expect high accuracy from this model. We can also use the accuracy function to know the correctly predicted values. We call the accuracy score function from the matrix, which is available in scikit-learn and inside that, we pass the two values of the Y test and Ypredict and print the accuracy score.
so, you can see we got an accuracy score of 0.9.
So, this is how we can implement decision tree classifier in scikit-learn.