RANDOM FOREST
Does it is similar as forest that we visits for in our free time ???
An algorithm for supervised learning is called random forest. An ensemble of decision trees, often trained using the bagging approach, makes up the "forest" it constructs. The bagging method's basic tenet is that the total outcome is increased by combining many learning models.
The bagging technique is expanded upon by the random forest algorithm, which uses feature randomness in addition to bagging to produce an uncorrelated forest of decision trees. Feature randomness, sometimes referred to as "the random subspace method" or "feature bagging" (link is external to IBM.com), produces a random selection of features that guarantee low correlation between decision trees. There is a significant distinction between random forests and decision trees. Random forests merely choose a portion of those features, whereas decision trees take into account all potential feature splits.
How exactly Random Forest Works?
The three primary hyperparameters of random forest algorithms must be set prior to training. These consist of the size of the nodes, the count of trees, and the quantity of characteristics sampled. Regression and classification issues can then be resolved using the random forest classifier.
Each decision tree in the ensemble of decision trees used in the random forest technique is made up of a bootstrap sample, which is a sample of data taken from a training set with replacement. One-third of the training sample is designated as test data; this is referred to as the out-of-bag (oob) sample, and it is something we will discuss more.
Feature bagging is then used to introduce yet another randomization, increasing dataset variety and decreasing decision tree correlation. The prediction's determination will change depending on the kind of problem. The individual decision trees in a regression job will be averaged, and in a classification work, the predicted class will be determined by a majority vote, or the most common categorical variable.
Random Forest with Classification and Regression:
The hyperparameters of a random forest are almost identical to those of a decision tree or bagging classifier. Fortunately, the classifier-class of random forest may be used with ease, eliminating the requirement to combine a decision tree with a bagging classifier. Regression tasks can also be handled using random forest by utilising the regressor of the method.
As the trees grow, random forest introduces more unpredictability into the model. When splitting a node, it looks for the best feature from a randomly selected subset of features rather than the most significant feature. This leads to a great deal of variation and, in general, a better model.
As such, the process for splitting a node in a random forest classifier only considers a random subset of the features. Using random thresholds for each feature in addition to looking for the optimal thresholds (like a typical decision tree does) is another way to further increase the randomness of trees.
Random Forest Apllications:
Several industries have used the random forest algorithm to help them make better business decisions. Among the use cases are:
Finance: Because it saves time on data administration and pre-processing duties, this method is preferred over others. It can be used to assess high-risk credit applicants, identify fraud, and identify issues with option pricing.
Healthcare: The random forest approach can be used to solve issues in gene expression categorization, biomarker development, and sequence annotation in computational biology (link points outside of IBM.com). Doctors are therefore able to estimate pharmacological responses to certain drugs.
E-commerce: Recommendation engines and cross-selling are two uses for it.
Random Forest Advantages:
- Adaptable applications.
- Simple-to-read hyperparameters.
- Not too many trees cause the classifier to overfit.
Random Forest Disadvantages:
- More trees are needed for increased accuracy.
- More trees cause the model to lag
- Incapable of describing relationships among data
So, Till here we have gone through Supervised Machine Learning. In which we Gone through Classifications and Regression with its respective algorithms with sufficient of information, working, application, uses, advantages and disadvantages, etc. From onwards we are moving for Unsupervised Machine Learning. So, I think readers you really enjoyed till here i hope you guys also enjoy next blogs from same channel !!
As, I previously said during the journey I will take readers through hand on journey. I am providing the link of folder which is freely accessible where i have posted various documents in which I have implemented the model on easiest level. Any beginner can easily understand the models.
Those models are implemented in "jupyter notebook" which is the platform for implementing python projects.
Kindly, refer the link provided below:
References:
[1]https://medium.com/@roiyeho/random-forests-98892261dc49