0% found this document useful (0 votes)
16 views24 pages

1 - Supervised Learning & Its Types

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views24 pages

1 - Supervised Learning & Its Types

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Supervised Learning & its Types

 Supervised learning is a type of machine learning algorithm that learns


from labeled data. Labeled data is data that has been tagged with a
correct answer or classification. The machine then learns to predict the
output for new input data.

 Supervised learning, as the name indicates, has the presence of a


supervisor as a teacher.
• Supervised learning is when we teach or train the machine using data
that is well-labelled. Which means some data is already tagged with the
correct answer. After that, the machine is provided with a new set of
examples (data) so that the supervised learning algorithm analyses the
training data (set of training examples) and produces a correct outcome
from labeled data.
 The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y).

 Some real-world applications of supervised learning are Risk


Assessment, Fraud Detection, Spam filtering, etc.
Supervised Learning: Categories

• Supervised machine learning can be classified into two types of


problems, which are given below:

a. Classification

b.Regression
a. Classification:

• Classification is a type of supervised learning that categorizes input data into predefined
labels.

• It involves training a model on labeled examples to learn patterns between input features
and output classes.

• In classification, the target variable is a categorical value, which means there are two
classes, such as Yes-No, Male-Female, True-False, etc.

• The model aims to generalize this learning to make accurate predictions on new, unseen
data.
• Algorithms like

• Random Forest

• Decision Trees

• Logistic Regression

• Support Vector Machines

are commonly used for classification tasks


Examples : Classification
a. Classifying emails as spam or not.

1. Classifying emails as spam or not


involves analyzing email content
and metadata to determine whether
they are unwanted or legitimate.

2. This task typically employs machine


learning algorithms trained on
labeled email datasets to identify
patterns and features associated with
spam messages.
3. Common features used in classification include email sender, subject line, body
text, attachments, and metadata such as IP addresses and timestamps.

4. The classification process aims to accurately label incoming emails as either spam
or non-spam to prevent unwanted emails from reaching users' inboxes.

5. By accurately identifying spam emails, this classification system helps improve


email security, reduces the risk of phishing attacks, and enhances user experience
by filtering out irrelevant or harmful messages.

6. Regular updates and refinements to the classification algorithm are necessary to


adapt to evolving spamming techniques and ensure high detection accuracy.
b. Market Basket Analysis:

• It is a modeling technique that has been associated with frequent transactions of buying some
combination of items.

Example: Amazon and many other Retailers use this technique. While viewing some products,
certain suggestions for commodities that some people have bought in the past are shown.

c. Weather Forecasting:

• Changing Patterns in weather conditions need to be observed based on parameters such as


temperature, humidity, and wind direction. This keen observation also requires the use of
previous records to predict them accurately.
Types of Classification

Decision Tree Classification: This type divides a dataset into segments based on feature variables,
making decisions at each node to classify data points.

K-Nearest Neighbors: This type of classification identifies the K closest data points to a given
observation and classifies it based on the majority class among its neighbors.

Logistic Regression: This classification type predicts the probability of Y being associated with the
X input variable.

Naïve Bayes: This classifier is a simple yet effective classification algorithm based on Bayes'
theorem, which calculates the probability of an event given the prior knowledge of conditions related
to the event.
Random Forest Classification: It combines multiple decision trees, each predicting the
probability of the target variable. The final output is determined by averaging these
probabilities.

Support Vector Machines: These utilize support vector classifiers, enhanced with
kernels, to evaluate non-linear decision boundaries effectively by enlarging feature
variable space.
Advantages: Classification

• Mining Based Methods are cost-effective and efficient

• Helps in identifying criminal suspects

• Helps in predicting the risk of diseases

• Helps Banks and Financial Institutions to identify defaulters so that they may approve
Cards, Loan, etc.
Disadvantages: Classification

1. Privacy: When the data is either, there are chances that a company may give some
information about its customers to other vendors or use this information for its profit.

2. Accuracy Problem: To get the best accuracy and result, an accurate model must be
selected.
APPLICATIONS: Classification

• Image Recognition

• Object Detection in Autonomous Vehicles

• Face Recognition

• Disease Diagnosis

• Land Cover Classification in Remote Sensing

• Voice Recognition
b. Regression:

• Regression is a supervised learning technique used to predict continuous


numerical values based on input features.
• It aims to establish a functional relationship between independent
variables and a dependent variable, such as predicting house prices
based on features like size, bedrooms, and location.
• The goal is to minimize the difference between predicted and actual
values using algorithms like Linear Regression, Decision Trees, or
Neural Networks, ensuring the model captures underlying patterns in the
data.
• Regression algorithms which come under supervised learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
Examples : Regression

• Predicting House Prices: Regression analysis can be used to predict house prices
based on factors such as square footage, number of bedrooms and bathrooms,
location, and other relevant features.

• Forecasting Sales: Regression analysis can help businesses forecast future sales
based on historical data, market trends, advertising expenditure, and other variables.
3. Predicting Student Performance: Regression analysis can be applied to
predict student performance based on factors such as study hours, attendance,
socioeconomic background, and previous academic achievements.

4. Analyzing Stock Returns: Regression analysis can be used to analyze the


relationship between stock returns and factors such as interest rates, market
volatility, company earnings, and industry performance.

5. Estimating GDP Growth: Regression analysis can help economists estimate


GDP growth based on variables such as consumer spending, investment,
government expenditure, and exports.
Types of Regression

Decision Tree Regression: The primary purpose of this regression is to divide the
dataset into smaller subsets to plot the value of any data point related to the problem
statement.

Principal Components Regression: This type simplifies regression analysis by


transforming correlated independent variables into a smaller set of uncorrelated
components.

Polynomial Regression: This type fits a non-linear equation by using the polynomial
functions of an independent variable.
Random Forest Regression: It is a popular method in Machine Learning that utilizes
multiple decision trees to predict the output by randomly selecting data points from the
dataset for each tree.

Simple Linear Regression: This type is the least complicated form of regression, where
the dependent variable is continuous.

Support Vector Regression: This regression type solves both linear and non-linear
models. It uses non-linear kernel functions, like polynomials, to find an optimal solution
for non-linear models.
Advantages: Regression

• Quantifies Relationships

• Prediction and Forecasting

• Identification of Significance

• Decision Support
Disadvantages: Regression

Overfitting: Overfitting occurs when the regression model captures noise or


random fluctuations in the data rather than the underlying relationship. Overfitted
models may perform well on the training data but generalize poorly to new data.

Data Requirements: Regression analysis requires sufficient data to estimate the


model parameters accurately. Small sample sizes or sparse data can lead to
unreliable estimates and inflated standard errors.
Applications: Regression

• Economics and Finance

• Marketing and Sales

• Healthcare

• Environmental Science

• Quality Control and Process Improvement


Even though classification and regression are both from the category of supervised learning, they
are not the same.

• The prediction task is a classification when the target variable is discrete. An application is the
identification of the underlying sentiment of a piece of text.

• The prediction task is a regression when the target variable is continuous. An example is
predicting a person’s salary given their education degree, previous work experience,
geographical location, and level of seniority.

You might also like