0% found this document useful (0 votes)
4 views13 pages

UNIT 1 - Types of Learning

The document provides a comprehensive overview of supervised and unsupervised learning in machine learning. It explains key concepts, types of problems, steps involved, algorithms, advantages, disadvantages, and applications of supervised learning, as well as the motivation and techniques for unsupervised learning. The document emphasizes the importance of labeled data in supervised learning and the exploration of patterns in unlabeled data for unsupervised learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views13 pages

UNIT 1 - Types of Learning

The document provides a comprehensive overview of supervised and unsupervised learning in machine learning. It explains key concepts, types of problems, steps involved, algorithms, advantages, disadvantages, and applications of supervised learning, as well as the motivation and techniques for unsupervised learning. The document emphasizes the importance of labeled data in supervised learning and the exploration of patterns in unlabeled data for unsupervised learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Types of Learning

Supervised learning

Supervised Learning: A Detailed Report

Supervised learning is one of the most commonly used paradigms in machine learning. In
supervised learning, a model is trained using labeled data, meaning that for each input in the
dataset, the correct output (or label) is provided. The primary goal of supervised learning is to
learn a mapping from inputs (features) to outputs (labels) in order to make predictions on
new, unseen data.

In supervised learning, the model is guided (supervised) during the learning process by these
known output labels, and its performance is assessed based on how accurately it predicts the
labels of the training data. Once the model has learned from the training data, it can be
applied to test data or real-world scenarios to make predictions.

1. Key Concepts in Supervised Learning

a. Labeled Data

 Labeled data refers to datasets where each input is paired with the correct output or
label. For example, in a classification task, the dataset might contain images of cats
and dogs, and each image would have a label indicating whether it is a "cat" or "dog."
 In supervised learning, the model uses this labeled data to learn the relationship
between the features (input variables) and the labels (output variables).

b. The Goal of Supervised Learning

 The goal of supervised learning is to learn a function that can map input features (x)
to a target output (y). This learned function should generalize well to unseen data,
meaning it should make accurate predictions when given new inputs.
 A model is trained to minimize the difference between its predicted outputs and the
actual labels in the training dataset. This difference is often referred to as the loss
function.

c. Types of Supervised Learning Problems

Supervised learning problems are typically divided into two main categories:

1. Classification: The task is to predict a categorical label (a class) for each input. For
example, classifying emails as either "spam" or "not spam."
o Example: Predicting whether an email is spam (spam = 1, not spam = 0).
2. Regression: The task is to predict a continuous output. For example, predicting the
price of a house based on features such as square footage, number of bedrooms, and
location.
o Example: Predicting house prices based on certain features (e.g., $200,000).

2. Steps Involved in Supervised Learning

a. Data Collection

 The first step in any supervised learning problem is to collect a dataset with labeled
examples. The dataset must represent the problem you're trying to solve, and it should
be diverse enough to help the model learn a generalizable pattern.
 A quality dataset includes:
o Features (X): The input variables that describe each data point. For example,
in a house price prediction task, features might include the number of rooms,
square footage, and location.
o Labels (Y): The output variable (or target) for each data point, which could be
a class label (for classification) or a continuous value (for regression).

b. Data Preprocessing

 Data cleaning: Remove or handle missing, corrupted, or irrelevant data points.


 Feature selection: Choose the most relevant features that will contribute to the
model's performance.
 Feature scaling/normalization: Standardize or normalize features to bring them onto
a common scale, especially important for distance-based algorithms like k-Nearest
Neighbors (k-NN) or Support Vector Machines (SVM).
 Data splitting: The dataset is usually split into two parts:
o Training set: A subset of the data used to train the model.
o Test set: A subset of the data used to evaluate the model's performance.
o Validation set (optional): A separate subset used to tune model
hyperparameters.

c. Model Selection

 Choosing an algorithm: The choice of algorithm depends on the type of problem


(classification or regression), the nature of the data, and the specific task. Some
common algorithms include:
o For Classification: Logistic Regression, Decision Trees, Random Forest, k-
Nearest Neighbors (k-NN), Naive Bayes, Support Vector Machines (SVM),
Neural Networks.
o For Regression: Linear Regression, Decision Trees, Random Forests, Support
Vector Regression (SVR), k-Nearest Neighbors (k-NN).

d. Model Training

 Training: The model is trained by learning patterns from the training data. The
algorithm iteratively adjusts its internal parameters (e.g., weights in neural networks)
to minimize the loss function (the difference between predicted and true values).
 Optimization: In most models, the training process involves optimizing an objective
function, typically a loss function, using optimization techniques such as gradient
descent, which adjusts the parameters to reduce the error.
e. Model Evaluation

 After training the model, the next step is to evaluate its performance using the test set
(data that the model has not seen before).
 Evaluation Metrics:
o For Classification:
 Accuracy: The percentage of correct predictions (true positives + true
negatives / total predictions).
 Precision: The proportion of positive predictions that are actually
correct (true positives / true positives + false positives).
 Recall (Sensitivity): The proportion of actual positives that were
correctly identified by the model (true positives / true positives + false
negatives).
 F1-score: The harmonic mean of precision and recall.
 Confusion Matrix: A matrix showing the true positives, true
negatives, false positives, and false negatives.
o For Regression:
 Mean Absolute Error (MAE): The average of the absolute
differences between predicted and actual values.
 Mean Squared Error (MSE): The average of the squared differences
between predicted and actual values.
 Root Mean Squared Error (RMSE): The square root of the mean
squared error.
 R-squared (R²): The proportion of the variance in the dependent
variable that is predictable from the independent variables.

f. Model Tuning

 Hyperparameter Tuning: Many machine learning algorithms have hyperparameters


that control the learning process (e.g., learning rate, depth of a decision tree, number
of neighbors in k-NN). These hyperparameters need to be optimized for better
performance. This can be done using techniques such as:
o Grid Search: Testing a predefined set of hyperparameter combinations.
o Random Search: Randomly sampling hyperparameter values within a
specified range.
o Bayesian Optimization: A probabilistic model-based approach to optimize
hyperparameters.
 Cross-Validation: To assess the model's generalizability and avoid overfitting, cross-
validation techniques (e.g., k-fold cross-validation) are used to divide the dataset into
several parts, using some for training and others for validation.

g. Model Deployment

 Once the model is trained, evaluated, and tuned, it can be deployed to make
predictions on new, unseen data. Model deployment involves integrating the trained
model into a production environment where it can process real-time or batch data.

3. Types of Supervised Learning Algorithms

a. Linear Models
1. Linear Regression
o Used for predicting a continuous output.
o The relationship between input features and the target is assumed to be linear.
o Example: Predicting house prices based on features like square footage,
number of bedrooms, etc.
2. Logistic Regression
o A classification algorithm used for predicting categorical outcomes, typically
binary (0 or 1).
o The output is a probability, and a threshold (often 0.5) is used to classify into
one of the two classes.

b. Tree-Based Models

1. Decision Trees
o Decision Trees split data based on feature values in a tree-like structure. They
are easy to interpret but can overfit if not pruned properly.
o Example: Classifying emails as spam or not based on word frequency.
2. Random Forest
o An ensemble of decision trees, where each tree is trained on a random subset
of data and features. Random forests reduce overfitting and improve
generalization.
o Example: Predicting customer churn based on historical data.
3. Gradient Boosting Machines (GBM) and XGBoost
o Ensemble methods that combine weak learners (e.g., decision trees) to form a
strong learner by iteratively correcting the errors of previous models.

c. Nearest Neighbor Methods

1. k-Nearest Neighbors (k-NN)


o A non-parametric method used for classification and regression. The algorithm
classifies a data point based on the majority label of its k-nearest neighbors in
the feature space.
o Example: Predicting the species of a flower based on its measurements.

d. Support Vector Machines (SVM)

 SVM is a powerful classification technique that finds the hyperplane that best
separates the classes in the feature space. It works well for high-dimensional data.
 Example: Classifying handwritten digits based on pixel values.

e. Neural Networks

 A family of algorithms inspired by the human brain, neural networks are powerful
tools for both classification and regression. They are particularly effective in tasks
involving large, complex datasets (e.g., image classification, speech recognition).
 Example: Image recognition or natural language processing.

4. Advantages and Disadvantages of Supervised Learning


Advantages:

 Clear Objectives:

Supervised learning provides clear goals and performance metrics, making it easier to
understand the model’s behavior.

 Predictive Power: Once trained, supervised learning models can make highly
accurate predictions on new data, particularly when the data is well-labeled and
representative of future inputs.
 Wide Applicability: Supervised learning algorithms are widely applicable across a
range of domains, from classification to regression tasks.

Disadvantages:

 Requires Labeled Data: The need for labeled data can be a significant limitation.
Labeling data can be expensive, time-consuming, and require domain expertise.
 Risk of Overfitting: Supervised learning models, particularly complex ones like
decision trees or neural networks, are prone to overfitting (fitting too closely to the
training data and failing to generalize to new data).
 Data Bias: If the labeled data is biased or unrepresentative, the model may also be
biased, leading to poor generalization or fairness issues.

5. Applications of Supervised Learning

 Spam Email Detection: Classifying emails as spam or non-spam based on their


content.
 Medical Diagnosis: Predicting diseases or conditions based on patient data (e.g.,
predicting whether a tumor is malignant or benign).
 Customer Segmentation: Grouping customers into categories based on purchasing
behavior or demographic features.
 Credit Scoring: Predicting whether a loan applicant will default on a loan based on
their financial history.
 Stock Price Prediction: Forecasting future stock prices based on historical market
data and other financial indicators.

Unsupervised learning

Unsupervised learning (USL) is a type of machine learning in which the model is trained on
data that has no labeled responses (or targets). The goal of unsupervised learning is to infer
the underlying structure, patterns, or distribution of the data without explicit supervision (i.e.,
without a teacher providing the correct output). USL is useful for discovering hidden
patterns, clustering data, and reducing dimensionality in datasets where labels are
unavailable, costly, or hard to define.

In contrast to supervised learning, where the model is trained on input-output pairs (labeled
data), unsupervised learning deals with input data that lacks labeled output. The algorithms
aim to identify relationships or structures within the data by finding patterns such as
similarities or groupings, or even by identifying the distribution of data across different
features.
1. Motivation for Unsupervised Learning

Unsupervised learning is particularly useful when:

 Labeled Data is Expensive or Difficult to Obtain: Labeling data requires expertise


(e.g., medical diagnoses, image annotation), and it can be expensive and time-
consuming. Unsupervised learning uses the large amounts of unlabeled data readily
available.
 Exploratory Data Analysis (EDA): Unsupervised learning techniques are often used
in the initial stages of data analysis to uncover structures or relationships that might
not be apparent at first.
 Pattern Discovery: USL helps in identifying hidden patterns or intrinsic structures
within data. It can be applied to detect anomalies, cluster similar data points, or
reduce the complexity of the data.

2. Key Concepts in Unsupervised Learning

a. Data Without Labels

Unlike supervised learning where each data point comes with a label or target value,
unsupervised learning works solely with input data. The model does not have direct guidance
on what the expected output should be.

b. Clustering

Clustering is the most common task in unsupervised learning, where the goal is to group
similar data points together based on their features or characteristics. Similarity is usually
determined by some distance measure (e.g., Euclidean distance, cosine similarity).

c. Dimensionality Reduction

Unsupervised learning can also help reduce the number of features (dimensions) in the
dataset. By simplifying the data, dimensionality reduction techniques make it easier to
visualize or use the data for other tasks while preserving its essential characteristics.

d. Density Estimation

This is the process of estimating the probability distribution of the data points. Unsupervised
learning can be used to infer the underlying distribution that the data follows, without
explicitly being told what that distribution is.

3. Common Techniques in Unsupervised Learning

a. Clustering Algorithms

Clustering is one of the most widely used methods in unsupervised learning. It helps to group
similar data points together and is applied in various fields, including customer segmentation,
image compression, and anomaly detection.
1. K-means Clustering
o Overview: K-means is a simple and popular clustering algorithm that
partitions the data into a specified number (K) of clusters. The algorithm
minimizes the variance within each cluster by iteratively adjusting the
centroids of the clusters.
o Steps:
1. Initialize K centroids randomly.
2. Assign each data point to the nearest centroid.
3. Recalculate the centroids of the clusters based on the data points
assigned.
4. Repeat the process until convergence (when centroids do not change
significantly).
o Pros: Easy to implement, efficient for large datasets, and widely used in
practice.
o Cons: Sensitive to initial centroid placement, requires the user to specify K,
and assumes spherical clusters.
2. Hierarchical Clustering
o Overview: Unlike K-means, hierarchical clustering builds a tree of clusters
(dendrogram). It can be agglomerative (bottom-up) or divisive (top-down). It
doesn't require the user to specify the number of clusters in advance.
o Steps:
1. Treat each data point as its own cluster.
2. Iteratively merge the closest clusters (agglomerative) or split larger
clusters (divisive).
3. Stop when the desired number of clusters or a stopping criterion is
reached.
o Pros: No need to specify the number of clusters, can capture nested clusters.
o Cons: Computationally expensive, sensitive to noise and outliers.
3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
o Overview: DBSCAN is a density-based clustering algorithm that groups
together points that are close to each other based on a density criterion,
allowing it to find arbitrarily shaped clusters. It can also handle noise and
outliers effectively.
o Steps:
1. For each data point, determine its neighborhood (points within a
specified distance, epsilon).
2. If a point has enough neighbors (greater than a minimum threshold), it
forms a cluster.
3. Points that don’t meet the density criterion are labeled as noise.
o Pros: No need to specify the number of clusters, can find clusters of arbitrary
shapes, robust to outliers.
o Cons: Performance degrades with high-dimensional data.

b. Dimensionality Reduction Techniques

Dimensionality reduction is used to reduce the number of features in the data while retaining
as much information as possible. This makes the data easier to analyze, visualize, or use for
further processing.

1. Principal Component Analysis (PCA)


oOverview: PCA is a linear dimensionality reduction technique that transforms
the data into a new coordinate system such that the greatest variance comes to
lie on the first few components. These components are called "principal
components."
o Steps:
1. Standardize the data (zero mean, unit variance).
2. Calculate the covariance matrix.
3. Compute the eigenvectors and eigenvalues of the covariance matrix.
4. Sort the eigenvectors by eigenvalue and select the top K eigenvectors.
5. Project the data onto these eigenvectors.
o Pros: Simple, interpretable, and works well when the data has a linear
structure.
o Cons: Assumes linearity in the data, sensitive to outliers.
2. t-SNE (t-Distributed Stochastic Neighbor Embedding)
o Overview: t-SNE is a nonlinear dimensionality reduction technique that is
particularly well-suited for the visualization of high-dimensional data. It
minimizes the divergence between probability distributions that represent
pairwise similarities in high and low-dimensional spaces.
o Pros: Excellent for visualization of complex data and capturing non-linear
structures.
o Cons: Computationally expensive, not ideal for large datasets, can struggle
with preserving the global structure of the data.
3. Autoencoders
o Overview: Autoencoders are a type of neural network used for unsupervised
learning, specifically for dimensionality reduction. An autoencoder consists of
two parts: an encoder that compresses the data into a lower-dimensional space
(latent space) and a decoder that reconstructs the data from this lower-
dimensional representation.
o Pros: Can learn nonlinear mappings, can handle high-dimensional data.
o Cons: Requires significant training, sensitive to hyperparameter choices.

c. Anomaly Detection

Anomaly detection identifies data points that deviate significantly from the majority of the
data, which is crucial in fraud detection, network security, and quality control.

1. Isolation Forest
o Overview: Isolation Forest is an unsupervised algorithm that isolates
anomalies instead of profiling normal data points. It recursively partitions the
data to isolate observations.
o Pros: Efficient for large datasets, works well for high-dimensional data.
o Cons: May struggle with very dense data distributions.
2. One-Class SVM
o Overview: One-Class SVM is a variant of the SVM that learns a decision
boundary around the majority of the data, effectively identifying outliers that
fall outside this boundary.
o Pros: Works well for datasets with a single class of normal data.
o Cons: Sensitive to parameter tuning and requires careful handling of outliers.
4. Applications of Unsupervised Learning

1. Customer Segmentation
o By clustering customers based on their purchasing behavior, businesses can
create targeted marketing strategies or personalize product recommendations.
2. Anomaly Detection
o Unsupervised learning is used to identify fraudulent transactions in banking,
network intrusions in cybersecurity, and defects in manufacturing.
3. Data Compression
o Dimensionality reduction techniques like PCA or autoencoders are used in
image compression, speech encoding, and video compression.
4. Recommendation Systems
o Unsupervised learning can help build collaborative filtering-based
recommendation systems, where users are clustered based on preferences, and
recommendations are made by finding similarities.
5. Genomics and Bioinformatics
o In genomics, unsupervised learning is used to group genes or proteins based
on their expression levels or sequence similarities, aiding in the discovery of
biological insights.
6. Image and Speech Processing
o Unsupervised learning can be used to group similar images or sounds, which
is useful for tasks like image recognition, denoising, and automatic labeling of
datasets.

5. Advantages and Disadvantages of Unsupervised Learning

Advantages:

 No Need for Labeled Data: Unsupervised learning can learn from data without
requiring labels, making it useful when labeled data is expensive or unavailable.
 Discover Hidden Patterns: It is useful for uncovering hidden relationships, clusters,
and structures that were previously

unknown.

 Flexibility: It can be applied to various domains, from clustering to dimensionality


reduction, anomaly detection, and more.

Disadvantages:

 Interpretability: Models built using unsupervised learning can be harder to interpret,


especially in the case of complex algorithms like autoencoders or t-SNE.
 Evaluation Metrics: Since there are no labels, evaluating the performance of
unsupervised learning models is difficult. Evaluation often requires domain
knowledge or additional tools.
 Sensitive to Parameters: Many unsupervised learning algorithms require the user to
choose parameters (e.g., number of clusters in K-means), and choosing the wrong
parameters can lead to poor performance.
Semi supervised learning

Semi-supervised learning (SSL) is a machine learning paradigm that lies between


supervised learning and unsupervised learning. In supervised learning, the model is trained on
a labeled dataset, while in unsupervised learning, the model works on unlabeled data without
any supervision. Semi-supervised learning, on the other hand, makes use of both labeled and
unlabeled data during training. Typically, the dataset consists of a small amount of labeled
data and a large amount of unlabeled data, and the goal is to improve learning performance
by leveraging the structure of the unlabeled data.

1. Motivation Behind Semi-Supervised Learning

The motivation for semi-supervised learning arises from the cost and effort involved in
labeling large datasets. Labeling is time-consuming, expensive, and sometimes impractical
(for instance, in medical imaging, natural language processing, etc.). In many real-world
scenarios, a large amount of unlabeled data is available, but labeling it all is not feasible.
Semi-supervised learning aims to use this large pool of unlabeled data effectively to improve
the performance of machine learning models, especially when labeled data is scarce.

2. Key Concepts

 Labeled Data: Data that comes with both the input (features) and the correct output
(label or target). For instance, in a classification task, labeled data might consist of
images of cats and dogs, with each image tagged as either "cat" or "dog."
 Unlabeled Data: Data where only the input is provided, but there is no corresponding
label. For example, an image dataset might contain pictures of animals, but the
species are not labeled.
 Objective of SSL: Use a small amount of labeled data to train a model and a larger
amount of unlabeled data to help improve the model’s performance by extracting
useful patterns and structure.

3. Methods in Semi-Supervised Learning

Several approaches exist for implementing semi-supervised learning, and these can be
categorized based on the assumptions about how the labeled and unlabeled data can be used:

a. Self-training (Bootstrapping)

 Idea: In self-training, a model is initially trained using the small labeled dataset. After
training, the model is used to predict labels for the unlabeled data. The most confident
predictions (those with high probability or certainty) are then added to the labeled
dataset, and the model is retrained using the enlarged labeled dataset.
 Procedure:
1. Train the model on the labeled data.
2. Use the model to predict labels for the unlabeled data.
3. Select the predictions with high confidence and add them to the labeled set.
4. Retrain the model using the updated labeled dataset.
5. Repeat the process iteratively.
 Pros: Simple and effective in many settings.
 Cons: The model might reinforce wrong predictions if the initial model is poorly
trained or if the unlabeled data is noisy.

b. Co-training

 Idea: Co-training assumes that the features can be split into two (or more) distinct
views or subsets that contain sufficient information to train a model. Each view is
used to train a separate model. After the initial training, each model labels unlabeled
data, and the most confident predictions are added to the training set of the other
model. This mutual labeling improves both models.
 Procedure:
1. Split the data features into two disjoint subsets.
2. Train two classifiers independently on these subsets.
3. Each classifier labels the unlabeled data, and confident predictions are added
to the labeled set of the other classifier.
4. Repeat the process iteratively.
 Pros: Works well when the data is diverse and can be split into distinct views.
 Cons: The success of co-training depends on the quality of the feature split, and it
may not work if the views are highly correlated.

c. Generative Models

 Idea: Generative models like Gaussian Mixture Models (GMMs), Hidden Markov
Models (HMMs), and Variational Autoencoders (VAEs) can be used in SSL by
modeling the data distribution. The model tries to learn both the distribution of the
labeled data and the unlabeled data to infer the possible labels of the unlabeled data.
 Procedure: The model learns the underlying distribution of the labeled and unlabeled
data together and uses this information to predict the labels of unlabeled data.
 Pros: Can be very powerful when the underlying data distribution is complex.
 Cons: Requires strong assumptions about the data distribution and can be
computationally intensive.

d. Graph-based Methods

 Idea: These methods treat data as a graph, where each data point is represented as a
node and edges represent similarities or relationships between data points. The goal is
to propagate the labels from labeled nodes to unlabeled nodes based on their
proximity or similarity in the graph.
 Procedure:
1. Construct a graph where each node represents a data point and edges represent
similarity.
2. Label the nodes that have labeled data.
3. Propagate the labels to the unlabeled nodes based on their similarity to the
labeled nodes (typically using graph-based algorithms like label propagation
or graph cuts).
 Pros: Works well when data has a clear relationship structure (e.g., social networks,
image regions).
 Cons: The graph construction can be expensive for large datasets, and it may not
generalize well for very sparse data.
e. Semi-supervised Support Vector Machines (S3VMs)

 Idea: This method extends the Support Vector Machine (SVM) to the semi-
supervised setting by finding a decision boundary that separates both labeled and
unlabeled data in a way that maximizes the margin between the classes while making
use of the unlabeled data.
 Procedure: In the case of S3VMs, the model learns not only from the labeled points
but also from the structure of the unlabeled data. The classifier tries to find a
boundary that respects the data structure in both the labeled and unlabeled portions.
 Pros: Works well when the data is linearly separable.
 Cons: S3VMs can be computationally expensive, especially for large datasets, due to
the need to solve a more complex optimization problem.

f. Pseudo-labelling

 Idea: This method involves assigning pseudo-labels to the unlabeled data by


predicting labels using an initially trained model and then adding those predictions as
"labels" in the training set.
 Procedure:
1. Train a model using the labeled data.
2. Use the trained model to predict pseudo-labels for the unlabeled data.
3. Incorporate the pseudo-labeled data into the training set.
4. Retrain the model with the new combined labeled dataset.
5. Repeat iteratively.
 Pros: Simple and works well when the initial model is fairly accurate.
 Cons: If the model is initially poor, it can lead to incorrect labels being added, which
may degrade performance.

4. Applications of Semi-Supervised Learning

 Image Recognition: In scenarios where labeling a large number of images is costly or


time-consuming, SSL can leverage unlabeled images to improve performance. For
example, SSL is commonly used in medical imaging (e.g., radiology scans), where a
small amount of labeled data can be augmented with large amounts of unlabeled data.
 Natural Language Processing (NLP): SSL is widely used in NLP for tasks like text
classification, sentiment analysis, and named entity recognition. Large amounts of
unlabeled text data are often available (e.g., from the web), and SSL techniques can
be used to improve the performance of NLP models by augmenting the labeled
dataset.
 Speech Recognition: In speech recognition, labeled audio data is expensive to
produce, so SSL can be used to leverage unlabeled speech data to improve models.
 Recommender Systems: SSL can help in recommender systems where the system
can learn from both user interactions (labeled data) and general user behavior
(unlabeled data).
 Healthcare: Semi-supervised learning can be highly beneficial in healthcare, where
expert annotations (e.g., in radiology or pathology) are scarce but a large amount of
unlabeled data exists.

5. Advantages and Disadvantages


Advantages:

 Reduces the need for labeled data: SSL uses a small amount of labeled data and a
large amount of unlabeled data, which can significantly reduce the cost and effort
required for data labeling.
 Improved performance: By leveraging unlabeled data, SSL can improve the
accuracy of the model compared to using only labeled data.
 Practical for many real-world problems: In many domains, labeled data is difficult
or expensive to obtain, but unlabeled data is abundant. SSL provides a useful solution
for these scenarios.

Disadvantages:

 Sensitive to noise: If the unlabeled data is noisy or misclassified, SSL methods can
propagate errors and lead to worse performance.
 Complexity: SSL models are often more complex and computationally intensive
compared to traditional supervised or unsupervised models.
 Dependence on assumptions: Many SSL algorithms, especially generative and
graph-based methods, make assumptions about the data distribution or structure that
may not always hold in practice.

You might also like