0% found this document useful (0 votes)
31 views

Supervised and Unsupervised Learning

The document discusses the two main types of machine learning: supervised and unsupervised learning. Supervised learning involves training algorithms on labeled data to make predictions, while unsupervised learning deals with unlabeled data to discover patterns and relationships. It also covers various algorithms, evaluation metrics, applications, advantages, and disadvantages of both learning types.

Uploaded by

bijemoderne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Supervised and Unsupervised Learning

The document discusses the two main types of machine learning: supervised and unsupervised learning. Supervised learning involves training algorithms on labeled data to make predictions, while unsupervised learning deals with unlabeled data to discover patterns and relationships. It also covers various algorithms, evaluation metrics, applications, advantages, and disadvantages of both learning types.

Uploaded by

bijemoderne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Supervised and Unsupervised learning

Last Updated : 23 Sep, 2024


Machine learning is a field of computer science that gives computers the


ability to learn without being explicitly programmed. Supervised learning and
unsupervised learning are two main types of machine learning.
In supervised learning, the machine is trained on a set of labeled data, which
means that the input data is paired with the desired output. The machine
then learns to predict the output for new input data. Supervised learning is
often used for tasks such as classification, regression, and object detection.
In unsupervised learning, the machine is trained on a set of unlabeled data,
which means that the input data is not paired with the desired output. The
machine then learns to find patterns and relationships in the data.
Unsupervised learning is often used for tasks such as clustering,
dimensionality reduction, and anomaly detection.
What is Supervised learning?
Supervised learning is a type of machine learning algorithm that learns from
labeled data. Labeled data is data that has been tagged with a correct
answer or classification.
Supervised learning, as the name indicates, has the presence of a supervisor
as a teacher. Supervised learning is when we teach or train the machine
using data that is well-labelled. Which means some data is already tagged
with the correct answer. After that, the machine is provided with a new set of
examples(data) so that the supervised learning algorithm analyses the
training data(set of training examples) and produces a correct outcome from
labeled data.
For example, a labeled dataset of images of Elephant, Camel and Cow would
have each image tagged with either “Elephant” , “Camel”or “Cow.”
Key Points:
 Supervised learning involves training a machine from labeled data.
 Labeled data consists of examples with the correct answer or
classification.
 The machine learns the relationship between inputs (fruit images) and
outputs (fruit labels).
 The trained machine can then make predictions on new, unlabeled
data.
Example:
Let’s say you have a fruit basket that you want to identify. The machine
would first analyze the image to extract features such as its shape, color,
and texture. Then, it would compare these features to the features of the
fruits it has already learned about. If the new image’s features are most
similar to those of an apple, the machine would predict that the fruit is an
apple.
For instance, suppose you are given a basket filled with different kinds of
fruits. Now the first step is to train the machine with all the different fruits
one by one like this:
 If the shape of the object is rounded and has a depression at the top, is
red in color, then it will be labeled as –Apple.
 If the shape of the object is a long curving cylinder having Green-Yellow
color, then it will be labeled as –Banana.
Now suppose after training the data, you have given a new separate fruit,
say Banana from the basket, and asked to identify it.
Since the machine has already learned the things from previous data and
this time has to use it wisely. It will first classify the fruit with its shape and
color and would confirm the fruit name as BANANA and put it in the Banana
category. Thus the machine learns the things from training data(basket
containing fruits) and then applies the knowledge to test data(new fruit).
Types of Supervised Learning
Supervised learning is classified into two categories of algorithms:
 Regression: A regression problem is when the output variable is a real
value, such as “dollars” or “weight”.
 Classification: A classification problem is when the output variable is
a category, such as “Red” or “blue” , “disease” or “no disease”.
Supervised learning deals with or learns with “labeled” data. This implies
that some data is already tagged with the correct answer.
1- Regression
Regression is a type of supervised learning that is used to predict continuous
values, such as house prices, stock prices, or customer churn. Regression
algorithms learn a function that maps from the input features to the output
value.
Some common regression algorithms include:
 Linear Regression
 Polynomial Regression
 Support Vector Machine Regression
 Decision Tree Regression
 Random Forest Regression
2- Classification
Classification is a type of supervised learning that is used to predict
categorical values, such as whether a customer will churn or not, whether an
email is spam or not, or whether a medical image shows a tumor or not.
Classification algorithms learn a function that maps from the input features
to a probability distribution over the output classes.
Some common classification algorithms include:
 Logistic Regression
 Support Vector Machines
 Decision Trees
 Random Forests
 Naive Baye
Evaluating Supervised Learning Models
Evaluating supervised learning models is an important step in ensuring that
the model is accurate and generalizable. There are a number of
different metrics that can be used to evaluate supervised learning models,
but some of the most common ones include:
For Regression
 Mean Squared Error (MSE): MSE measures the average squared
difference between the predicted values and the actual values. Lower
MSE values indicate better model performance.
 Root Mean Squared Error (RMSE): RMSE is the square root of
MSE, representing the standard deviation of the prediction
errors. Similar to MSE, lower RMSE values indicate better model
performance.
 Mean Absolute Error (MAE): MAE measures the average absolute
difference between the predicted values and the actual values. It is
less sensitive to outliers compared to MSE or RMSE.
 R-squared (Coefficient of Determination): R-squared measures the
proportion of the variance in the target variable that is explained by
the model. Higher R-squared values indicate better model fit.
For Classification
 Accuracy: Accuracy is the percentage of predictions that the model
makes correctly. It is calculated by dividing the number of correct
predictions by the total number of predictions.
 Precision: Precision is the percentage of positive predictions that the
model makes that are actually correct. It is calculated by dividing the
number of true positives by the total number of positive predictions.
 Recall: Recall is the percentage of all positive examples that the
model correctly identifies. It is calculated by dividing the number of
true positives by the total number of positive examples.
 F1 score: The F1 score is a weighted average of precision and recall. It
is calculated by taking the harmonic mean of precision and recall.
 Confusion matrix: A confusion matrix is a table that shows the
number of predictions for each class, along with the actual class
labels. It can be used to visualize the performance of the model and
identify areas where the model is struggling.
Applications of Supervised learning
Supervised learning can be used to solve a wide variety of problems,
including:
 Spam filtering: Supervised learning algorithms can be trained to
identify and classify spam emails based on their content, helping users
avoid unwanted messages.
 Image classification: Supervised learning can automatically classify
images into different categories, such as animals, objects, or scenes,
facilitating tasks like image search, content moderation, and image-
based product recommendations.
 Medical diagnosis: Supervised learning can assist in medical
diagnosis by analyzing patient data, such as medical images, test
results, and patient history, to identify patterns that suggest specific
diseases or conditions.
 Fraud detection: Supervised learning models can analyze financial
transactions and identify patterns that indicate fraudulent activity,
helping financial institutions prevent fraud and protect their customers.
 Natural language processing (NLP): Supervised learning plays a
crucial role in NLP tasks, including sentiment analysis, machine
translation, and text summarization, enabling machines to understand
and process human language effectively.
Advantages of Supervised learning
 Supervised learning allows collecting data and produces data output
from previous experiences.
 Helps to optimize performance criteria with the help of experience.
 Supervised machine learning helps to solve various types of real-world
computation problems.
 It performs classification and regression tasks.
 It allows estimating or mapping the result to a new sample.
 We have complete control over choosing the number of classes we
want in the training data.
Disadvantages of Supervised learning
 Classifying big data can be challenging.
 Training for supervised learning needs a lot of computation time. So, it
requires a lot of time.
 Supervised learning cannot handle all complex tasks in Machine
Learning.
 Computation time is vast for supervised learning.
 It requires a labelled data set.
 It requires a training process.
What is Unsupervised learning?
Unsupervised learning is a type of machine learning that learns from
unlabeled data. This means that the data does not have any pre-existing
labels or categories. The goal of unsupervised learning is to discover patterns
and relationships in the data without any explicit guidance.
Unsupervised learning is the training of a machine using information that is
neither classified nor labeled and allowing the algorithm to act on that
information without guidance. Here the task of the machine is to group
unsorted information according to similarities, patterns, and differences
without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will
be given to the machine. Therefore the machine is restricted to find the
hidden structure in unlabeled data by itself.
You can use unsupervised learning to examine the animal data that has been
gathered and distinguish between several groups according to the traits and
actions of the animals. These groupings might correspond to various animal
species, providing you to categorize the creatures without depending on
labels that already exist.
Key Points
 Unsupervised learning allows the model to discover patterns and
relationships in unlabeled data.
 Clustering algorithms group similar data points together based on their
inherent characteristics.
 Feature extraction captures essential information from the data,
enabling the model to make meaningful distinctions.
 Label association assigns categories to the clusters based on the
extracted patterns and characteristics.
Example
Imagine you have a machine learning model trained on a large dataset of
unlabeled images, containing both dogs and cats. The model has never seen
an image of a dog or cat before, and it has no pre-existing labels or
categories for these animals. Your task is to use unsupervised learning to
identify the dogs and cats in a new, unseen image.
For instance, suppose it is given an image having both dogs and cats which
it has never seen.
Thus the machine has no idea about the features of dogs and cats so we
can’t categorize it as ‘dogs and cats ‘. But it can categorize them according
to their similarities, patterns, and differences, i.e., we can easily categorize
the above picture into two parts. The first may contain all pics
having dogs in them and the second part may contain all pics having cats in
them. Here you didn’t learn anything before, which means no training data or
examples.
It allows the model to work on its own to discover patterns and information
that was previously undetected. It mainly deals with unlabelled data.
Types of Unsupervised Learning
Unsupervised learning is classified into two categories of algorithms:
 Clustering: A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by
purchasing behavior.
 Association: An association rule learning problem is where you want
to discover rules that describe large portions of your data, such as
people that buy X also tend to buy Y.
Clustering
Clustering is a type of unsupervised learning that is used to group similar
data points together. Clustering algorithms work by iteratively moving data
points closer to their cluster centers and further away from data points in
other clusters.
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:-
1. Hierarchical clustering
2. K-means clustering
3. Principal Component Analysis
4. Singular Value Decomposition
5. Independent Component Analysis
6. Gaussian Mixture Models (GMMs)
7. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Association rule learning
Association rule learning is a type of unsupervised learning that is used to
identify patterns in a data. Association rule learning algorithms work by
finding relationships between different items in a dataset.
Some common association rule learning algorithms include:
 Apriori Algorithm
 Eclat Algorithm
 FP-Growth Algorithm
Evaluating Non-Supervised Learning Models
Evaluating non-supervised learning models is an important step in ensuring
that the model is effective and useful. However, it can be more challenging
than evaluating supervised learning models, as there is no ground truth data
to compare the model’s predictions to.
There are a number of different metrics that can be used to evaluate non-
supervised learning models, but some of the most common ones include:
 Silhouette score: The silhouette score measures how well each data
point is clustered with its own cluster members and separated from
other clusters. It ranges from -1 to 1, with higher scores indicating
better clustering.
 Calinski-Harabasz score: The Calinski-Harabasz score measures the
ratio between the variance between clusters and the variance within
clusters. It ranges from 0 to infinity, with higher scores indicating
better clustering.
 Adjusted Rand index: The adjusted Rand index measures the
similarity between two clusterings. It ranges from -1 to 1, with higher
scores indicating more similar clusterings.
 Davies-Bouldin index: The Davies-Bouldin index measures the
average similarity between clusters. It ranges from 0 to infinity, with
lower scores indicating better clustering.
 F1 score: The F1 score is a weighted average of precision and
recall, which are two metrics that are commonly used in supervised
learning to evaluate classification models. However, the F1 score can
also be used to evaluate non-supervised learning models, such as
clustering models.
Application of Unsupervised learning
Non-supervised learning can be used to solve a wide variety of problems,
including:
 Anomaly detection: Unsupervised learning can identify unusual
patterns or deviations from normal behavior in data, enabling the
detection of fraud, intrusion, or system failures.
 Scientific discovery: Unsupervised learning can uncover hidden
relationships and patterns in scientific data, leading to new hypotheses
and insights in various scientific fields.
 Recommendation systems: Unsupervised learning can identify patterns
and similarities in user behavior and preferences to recommend
products, movies, or music that align with their interests.
 Customer segmentation: Unsupervised learning can identify groups of
customers with similar characteristics, allowing businesses to target
marketing campaigns and improve customer service more effectively.
 Image analysis: Unsupervised learning can group images based on
their content, facilitating tasks such as image classification, object
detection, and image retrieval.
Advantages of Unsupervised learning
 It does not require training data to be labeled.
 Dimensionality reduction can be easily accomplished using
unsupervised learning.
 Capable of finding previously unknown patterns in data.
 Unsupervised learning can help you gain insights from unlabeled data
that you might not have been able to get otherwise.
 Unsupervised learning is good at finding patterns and relationships in
data without being told what to look for. This can help you learn new
things about your data.
Disadvantages of Unsupervised learning
 Difficult to measure accuracy or effectiveness due to lack of predefined
answers during training.
 The results often have lesser accuracy.
 The user needs to spend time interpreting and label the classes which
follow that classification.
 Unsupervised learning can be sensitive to data quality, including
missing values, outliers, and noisy data.
 Without labeled data, it can be difficult to evaluate the performance of
unsupervised learning models, making it challenging to assess their
effectiveness.
Supervised vs. Unsupervised Machine Learning
Supervised Unsupervised
Parameters machine learning machine learning

Algorithms are used


Algorithms are trained
against data that is not
using labeled data.
Input Data labeled

Computational Computationally
Simpler method
Complexity complex

Accuracy Highly accurate Less accurate

No. of classes is not


No. of classes is known
No. of classes known

Uses real-time analysis


Uses offline analysis
Data Analysis of data

Linear and Logistics


regression,KNN
Random forest, multi- K-Means clustering,
class classification, Hierarchical clustering,
decision tree, Support Apriori algorithm, etc.
Vector Machine, Neural
Algorithms used Network, etc.

Desired output is Desired output is not


Output given. given.

Use training data to No training data is


Training data infer model. used.

Complex model It is not possible to It is possible to learn


Supervised Unsupervised
Parameters machine learning machine learning

learn larger and more


larger and more
complex models than
complex models with
with supervised
unsupervised learning.
learning.

We can not test our


We can test our model.
Model model.

Supervised learning is Unsupervised learning


also called is also called
Called as classification. clustering.

Example: Optical Example: Find a face in


Example character recognition. an image.

Unsupervised learning
supervised learning
does not need any
needs supervision to
supervision to train the
train the model.
Supervision model.

Conclusion
Supervised and unsupervised learning are two powerful tools that can be
used to solve a wide variety of problems. Supervised learning is well-suited
for tasks where the desired output is known, while unsupervised learning is
well-suited for tasks where the desired output is unknown.
Frequently asked Question(FAQ’s)
1. What is the difference between supervised and unsupervised
machine language?
Supervised and unsupervised learning are two fundamental approaches to
machine learning that differ in their training data and learning objectives.
 Supervised learning involves training a machine learning model on a
labeled dataset, where each data point has a corresponding label or
output value. The algorithm learns to map the input data to the
desired output, allowing it to make predictions for new, unseen data.
 Unsupervised learning, on the other hand, deals with unlabeled
datasets, where the data points do not have associated labels or
output values.
2. What is supervised learning?
Supervised learning is a type of machine learning where the algorithm is
trained on a labeled dataset, where each data point has a corresponding
label or output value. The algorithm learns to map the input data to the
desired output, allowing it to make predictions for new, unseen data.
3. What are common supervised learning algorithms?
Common supervised learning algorithms include:
 Classification: Used to assign categories to data points. Examples
include support vector machines (SVMs), logistic regression, and
decision trees.
 Regression: Used to predict continuous numerical values. Examples
include linear regression, polynomial regression, and ridge regression.
4. What are common unsupervised learning algorithms?
Common unsupervised learning algorithms include:
 Clustering: Grouping data points into clusters based on their
similarity. Examples include k-means clustering and hierarchical
clustering.
 Dimensionality reduction: Reducing the number of features in a
dataset while preserving the most important information. Examples
include principal component analysis (PCA) and autoencoders.
5. What is unsupervised learning?
Unsupervised learning is a type of machine learning where the algorithm is
trained on an unlabeled dataset, where the data points do not have
corresponding labels or output values. The algorithm learns to identify
patterns and structures in the data without explicit guidance.
6. When to use supervised learning vs. unsupervised learning?
Use supervised learning when you have a labeled dataset and want to make
predictions for new data. Use unsupervised learning when you have an
unlabeled dataset and want to identify patterns or structures in the data.

You might also like