SUPERVISED LEARNING
What is Supervised Learning in Machine Learning ?
So, guys have u ever heard the Spam E-Mail concept? The thing which can detected by the Supervised Learning.
The algorithm is given a dataset with inputs and corresponding outputs, and it learns to map the inputs to the correct outputs. supervised learning can be broadly categorized into two main types: classification and regression.
In REGRESSION, the algorithm's goal is to predict a continuous value based on input data. For instance, imagine we want to predict the price of a house based on features like its size, number of bedrooms, and location. Here, the data type used would also be structured data, but instead of discrete labels, we'd have a continuous target variable (the price of the house).
now in CLASSIFICATION, the algorithm's task is to categorize data into different classes or categories. For example, let's say we want to build a system that can classify whether an email is spam or not spam. Here, the data type used would typically be structured data containing features of the email like sender, subject, and body, along with the label indicating whether it's spam or not.
A] Algorithms used for Regression:
For regression inputs are the in the form of numerical data and output is driven in the form of prediction.
1. Linear Regression :
The core concept of linear regression revolves around fitting a straight line to the data points in such a way that the line best represents the relationship between the independent and dependent variables. This line is represented by the equation:
Where:
- is the dependent variable (the variable we want to predict),
- is the independent variable (the variable used for prediction),
- is the intercept (the value of when is zero),
- is the slope (the change in for a one-unit change in ),
- is the error term (the difference between the actual and predicted values of ).
Logistic regression uses a logistic function called a sigmoid function to map predictions and their probabilities. The sigmoid function refers to an S-shaped curve that converts any real value to a range between 0 and 1.Moreover, if the output of the sigmoid function (estimated probability) is greater than a predefined threshold on the graph, the model predicts that the instance belongs to that class. If the estimated probability is less than the predefined threshold, the model predicts that the instance does not belong to the class. The sigmoid function is referred to as an activation function for logistic regression and is defined as:
[2]
where,
- e = base of natural logarithm
- value(x) = numerical value one wishes to transform
B] Algorithms used for Classification:
Although decision trees are a supervised learning approach, they are mostly employed to solve classification issues. However, they may also be used to solve regression problems. This classifier is tree-structured, with internal nodes standing in for dataset attributes, branches for decision rules, and leaf nodes for each outcome. The Decision Node and the Leaf Node are the two nodes that make up a decision tree. While leaf nodes represent the result of decisions and do not have any more branches, decision nodes are used to make any kind of decision and have numerous branches. The characteristics of the provided dataset are used to inform the decisions or the test. A decision tree only poses a question, and then divides the tree into subtrees according to the response (Yes/No).
The decision tree's general structure is illustrated in the diagram below:
2. Random Forest:
The decision tree's general structure is illustrated in the diagram below:
Decision Tree
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
Random Forest
References:
[1] https://www.javatpoint.com/linear-regression-in-machine-learning
[2]https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-logistic-regression/#:~:text=Practices%20for%202022-,What%20Is%20Logistic%20Regression%3F,1%2C%20or%20true%2Ffalse.
No comments:
Post a Comment