Sunday, March 31, 2024

STAIR FOUR: DECISION TREE

DECISION TREE

What is Decision Tree ? Does this sounds like our normal tress ????
  • Although decision trees are a supervised learning technique, they are primarily employed to solve classification problems. 
  • However, they can also be used to solve regression problems. 
  • This classifier is tree-structured, with internal nodes standing in for dataset attributes, branches for decision rules, and leaf nodes for each outcome.
  • The Decision Node and the Leaf Node are the two nodes that make up a decision tree. 
  • While leaf nodes represent the result of decisions and do not have any more branches, decision nodes are used to make any kind of decision and have numerous branches.The characteristics of the provided dataset are used to inform the decisions or the test.
  • It is a graphical tool that shows all of the options for solving a problem or making a decision given certain parameters.
  • It is named a decision tree because, like a tree, it begins with the root node and grows on subsequent branches to form a structure like a tree.
  • The Classification and Regression Tree algorithm, or CART algorithm, is what we use to construct trees.
  • A decision tree only poses a question, and then divides the tree into subtrees according to the response (Yes/No).
  • Below diagram explains the general structure of a decision tree:


Decision Tree
Terminologies for Decision Trees:
  • Root Node: The decision tree originates at the root node. It depicts the complete dataset, which is then split up into two or more sets of similar data.
  • Leaf Node: After obtaining a leaf node, the tree cannot be further divided; leaf nodes are the ultimate output nodes.
  • Splitting: The process of splitting the decision node/root node into sub-nodes in accordance with the specified parameters is known as splitting.
  • Branch/Sub Tree: A tree created by slicing another tree into a branch or subtree.
  • Pruning: Removing undesirable limbs from a tree is the process of pruning.
  • Parent/Child Node: Nodes in a tree are referred to as parent and child nodes, respectively. The parent node is the root node. 
What is the Process of the Decision Tree Algorithm?

The procedure in a decision tree begins at the root node in order to forecast the class of the given dataset. This algorithm follows the branch and advances to the next node by comparing the values of the root attribute with the record (actual dataset) attribute.

Decision Tree Process can be followed from some steps:
  1. Start the tree at the root node, which has the entire dataset.
  2. Use the Attribute Selection Measure (ASM) to determine which attribute in the dataset is the best.
  3. Create subgroups inside the S that include potential values for the greatest qualities.
  4. Create the decision tree node with the best attribute at its core.
  5. Using the dataset subsets generated in step 3, recursively develop new decision trees. This process should be continued until the nodes can no longer be classified further; at this point, the final node is referred to as a leaf node.
Attribute Selection Measures:
The primary problem that emerges while implementing a decision tree is figuring out which attribute is ideal for the root node and its child nodes. In order to address these issues, a method known as attribute selection measure, or ASM, has been developed. We can quickly choose the ideal attribute for the tree's nodes using this measurement. For ASM, there are two widely used methods, which are:

A] Information Gain:
  • The measurement of changes in entropy following the attribute-based dataset segmentation is known as information gain.
  • It determines the amount of knowledge a feature gives us about a class.
  • We divide the node and create the decision tree based on the information gain value.
  • A node or attribute with the largest information gain is split first in a decision tree algorithm, which always seeks to maximise the value of information gain. It can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)]

Entropy A metric used to quantify the impurity of a given characteristic is called entropy. It describes the data's unpredictability.
'p' denotes the probability of entropy
E(S) denotes the entropy
B] Gini Index:
  • When building a decision tree using the CART (Classification and Regression Tree) technique, the Gini index is a measure of purity or impurity.
  • It is better to choose an attribute with a low Gini index over one with a high index.
  • The CART algorithm only produces binary splits, and it does so by utilising the Gini index.
  • It can be calculated as:

'pi' is probability of an object being classified to a particular class



Advantages of the Decision Tree:
  • It is easy to understand since it uses the same procedure that people do when they make decisions in the actual world.
  • It can be quite helpful in resolving issues involving decisions.
  • It is beneficial to consider every scenario that could arise with a problem.
  • Compared to other algorithms, less data cleansing is necessary.
Disadvantages of the Decision Tree:
  • The decision tree is complicated since it has several tiers.
  • It might have an overfitting problem, which the Random Forest method can remedy.

In provided notebook Decision tree and random forest models are implemented on the data of employees. On this basis employees are categorized into tree. 
 
As, I previously said during the journey I will take readers through hand on journey. I am providing the link of folder which is freely accessible where i have posted various documents in which i have implemented the model on easiest level. Any beginner can easily understand the models.
Those models are implemented in "jupyter notebook" which is the platform for implementing python projects. 

Kindly, refer the link provided below:

References:


No comments:

Post a Comment

Probability and Statistical Operation Using Python

 STATISTICS AND POBABILITY  STATISTICS: The process of gathering information, tabulating it, and interpreting it numerically is known as sta...