0% found this document useful (0 votes)
2 views

Module 1

The document outlines a comprehensive syllabus for a Machine Learning course, detailing five modules: Foundations of Machine Learning, Supervised Learning, Ensemble Learning, Unsupervised Learning, and Python Machine Learning. Each module covers various concepts, techniques, algorithms, and applications related to machine learning, including supervised and unsupervised methods, reinforcement learning, and practical implementation using Python. The syllabus also addresses key terminologies and challenges in machine learning, such as overfitting and the curse of dimensionality.

Uploaded by

prog78659
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 1

The document outlines a comprehensive syllabus for a Machine Learning course, detailing five modules: Foundations of Machine Learning, Supervised Learning, Ensemble Learning, Unsupervised Learning, and Python Machine Learning. Each module covers various concepts, techniques, algorithms, and applications related to machine learning, including supervised and unsupervised methods, reinforcement learning, and practical implementation using Python. The syllabus also addresses key terminologies and challenges in machine learning, such as overfitting and the curse of dimensionality.

Uploaded by

prog78659
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

MACHINE LEARNING SYLLABUS

JAYA RATNAM POPURI


AP/ECE

05-02-2025 1
Module I :FOUNDATIONS OF MACHINELEARNING
Types of machine learning: Supervised learning-Unsupervised Learning-
Reinforcement Learning-Machine Learning Process -Terminologies: Weight
Space, Curse of Dimensionality, Overfitting, Training, Testing, and
Validation Sets Performance Measures.
Module II : SUPERVISED LEARNING
Decision Tree Learning-Artificial Neural Networks-Naïve Bayes Classifier-
k-Nearest Neighbour algorithm-Support Vector Machines.

05-02-2025 2
Module III :ENSEMBLE LEARNING
Ensemble Learning-Ensemble Methods–Types of Ensemble Methods–Bagging,
Boosting, stacking–Boosting Algorithm, Stacking Algorithm
Module IV: UNSUPERVISED LEARNING
Principal Components Analysis, K-means Clustering, Hierarchical Clustering,
Expectation Maximization(EM)algorithm
Module V: PYTHON MACHINE LEARNING
Essential Python Libraries-A road map for building machine learning systems–
Preparation of Dataset–Testing of ANN, Decision Tree and Naïve Bayes
Classifier using python
05-02-2025 3
MODULE 1-
FOUNDATIONS OF MACHINE LEARNING

05-02-2025 4
1. Types of machine learning: a) Supervised learning-
b) Unsupervised Learning-
c) Reinforcement Learning
2. Machine Learning Process- Terminologies:
Weight Space
Curse of Dimensionality
Overfitting
Training
Testing
and Validation Sets
Performance Measures.

05-02-2025 5
VENN diagram of AI, ML and DL:

05-02-2025 6
Types of machine learning:

05-02-2025 7
 ML teaches the systems to think and understand like humans by learning
from the data.
Types of Machine Learning:
1.Supervised Machine Learning
2.Unsupervised Machine Learning
3.Semi-Supervised Machine Learning
4.Reinforcement Learning

05-02-2025 8
1. Supervised Machine Learning:
 Supervised learning is defined as when a model gets trained on a “Labelled
Dataset”.
 Labelled datasets have both input and output parameters.
 Supervised Learning algorithms learn to map points between inputs and
correct outputs.
 It has both training and validation datasets labelled.

05-02-2025 9
 Example: To build an image classifier to differentiate between cats and
dogs.
 If you feed the datasets of dogs and cats labelled images to the
algorithm, the machine will learn to classify between a dog or a cat from
these labeled images.
 When we input new dog or cat images that it has never seen before, it
will use the learned algorithms and predict whether it is a dog or a cat.
 This is image classification by using Supervised learning.

05-02-2025 10
05-02-2025 11
Supervised Machine Learning types: 2

Classification Regression

Map i/p features to Map i/p features to


predefined class numerical value

05-02-2025 12
Classification:
 Classification deals with predicting categorical target variables, which
represent discrete classes or labels.
 Classification algorithms learn to map the input features to one of the
predefined classes.
 Eg: 1.Classifying emails as spam or not spam
2. predicting whether a patient has a high risk of heart disease.

05-02-2025 13
Classification algorithms:
1. Logistic Regression
2. Support Vector Machine (SVM)
3. Random Forest
4. Decision Tree
5. K-Nearest Neighbors (KNN)
6. Naive Bayes

05-02-2025 14
Regression:
 It deals with predicting continuous target variables, which represent
numerical values.
 Regression algorithms learn to map the input features to a continuous
numerical value.
 Eg: 1.Predicting the price of a house based on its size, location, and
amenities
2. Forecasting the sales of a product.

05-02-2025 15
Regression algorithms:
1. Linear Regression
2. Decision tree

05-02-2025 16
Advantages of Supervised Machine Learning:
1. High accuracy as they are trained on labelled data.
2. The process of decision-making is often interpretable.
3. It can often be used in pre-trained models which saves time and
resources.

05-02-2025 17
Disadvantages of Supervised Machine Learning:
1. It may struggle with unseen or unexpected patterns that are not present
in the training data.
2. It can be time-consuming and costly.
3. It may lead to poor generalizations based on new data.

05-02-2025 18
Applications of Supervised Learning:
1. Image classification: Identify objects, faces, and other features in images.
2. Natural language processing: Extract information from text, such as
sentiment, entities, and relationships.
3. Speech recognition: Convert spoken language into text.
4. Recommendation systems: Make personalized recommendations to users.
5. Predictive analytics: Predict outcomes, such as sales, customer churn, and
stock prices.
6. Medical diagnosis: Detect diseases and other medical conditions.
05-02-2025 19
7. Fraud detection: Identify fraudulent transactions.
8. Autonomous vehicles: Recognize and respond to objects in the environment.
9. Email spam detection: Classify emails as spam or not spam.
10. Quality control in manufacturing: Inspect products for defects.
11. Credit scoring: Assess the risk of a borrower defaulting on a loan.
12. Weather forecasting: Make predictions for temperature, precipitation, and
other meteorological parameters.
13. Sports analytics: Analyze player performance, make game predictions, and
optimize strategies.
05-02-2025 20
2. Unsupervised Machine Learning :
 An algorithm discovers patterns and relationships using unlabeled data.
 Unlike supervised learning, unsupervised learning doesn’t involve
providing the algorithm with labeled target outputs.
 The primary goal of Unsupervised learning is often to discover hidden
patterns, similarities, or clusters within the data.
 It can be used for various purposes, such as data exploration, visualization,
dimensionality reduction, and more.

05-02-2025 21
05-02-2025 22
Example:
 Consider that you have a dataset that contains information about the
purchases you made from the shop.
 Through clustering, the algorithm can group the same purchasing
behavior among you and other customers, which reveals potential
customers without predefined labels.
 This type of information can help businesses get target customers as
well as identify outliers.

05-02-2025 23
UnSupervised Machine Learning types: 2

Clustering Association

05-02-2025 24
Clustering:
 It is the process of grouping data points into clusters based on their
similarity.
 This technique is useful for identifying patterns and relationships in data
without the need for labeled examples.
Clustering algorithms: 2

Principal Component Analysis


K-Means Clustering algorithm
05-02-2025 25
Association:
 Association rule learning is a technique for discovering relationships
between items in a dataset.
 It identifies rules that indicate the presence of one item implies the
presence of another item with a specific probability.

05-02-2025 26
Advantages of Unsupervised Machine Learning
1. It helps to discover hidden patterns and various relationships between
the data.
2. Used for tasks such as customer segmentation, anomaly detection,
and data exploration.
3. It does not require labeled data and reduces the effort of data labeling.

05-02-2025 27
Disadvantages of Unsupervised Machine Learning:
1. Without using labels, it may be difficult to predict the quality of the
model’s output.
2. Cluster Interpretability may not be clear and may not have meaningful
interpretations.
3. It has techniques such as autoencoders and dimensionality reduction
that can be used to extract meaningful features from raw data.

05-02-2025 28
Applications of Unsupervised Learning:
1. Clustering: Group similar data points into clusters.
2. Anomaly detection: Identify outliers or anomalies in data.
3. Dimensionality reduction: Reduce the dimensionality of data while preserving
its essential information.
4. Recommendation systems: Suggest products, movies, or content to users
based on their historical behavior or preferences.
5. Topic modeling: Discover latent topics within a collection of documents.
6. Density estimation: Estimate the probability density function of data.
7. Image and video compression: Reduce the amount of storage required for
05-02-2025 29
multimedia content.
8. Image and video compression: Reduce the amount of storage required for
multimedia content.
9. Market basket analysis: Discover associations between products.
10. Genomic data analysis: Identify patterns or group genes with similar
expression profiles.
11. Image segmentation: Segment images into meaningful regions.
12. Community detection in social networks: Identify communities or
groups of individuals with similar interests or connections.
13. Content recommendation: Classify and tag content to make it easier to
recommend
05-02-2025
similar items to users. 30
 3. Reinforcement Machine Learning:
 Reinforcement machine learning algorithm is a learning method that the
agent interacts with the environment by producing actions and
discovering errors.
 Trial, error, and delay are the most relevant characteristics of
reinforcement learning.
 In this technique, the model keeps on increasing its performance using
Reward Feedback to learn the behavior or pattern.

05-02-2025 31
 RL Eg: 1.Google Self Driving car,
2. AlphaGo where a bot competes with humans and even itself to
get better and better performers in Go Game.
 Each time we feed in data, they learn and add the data to their
knowledge which is training data. So, the more it learns the better it gets
trained and hence experienced.

05-02-2025 32
 Example: Consider that you are
training an AI agent to play a game
like chess.
 The agent explores different moves
and receives positive or negative
feedback based on the outcome.
 Reinforcement Learning also finds
Fig: Reinforcement Learning applications in which they learn to
perform tasks by interacting with
their surroundings.

05-02-2025 33
Types of Reinforcement Machine Learning:2
1. Positive reinforcement
 Rewards the agent for taking a desired action.
 Encourages the agent to repeat the behavior.
 Examples: Giving a treat to a dog for sitting, providing a point in a game for
a correct answer.
2. Negative reinforcement
 Removes an undesirable stimulus to encourage a desired behavior.
 Discourages the agent from repeating the behavior.
 Examples: Turning off a loud buzzer when a lever is pressed, avoiding a
penalty by completing a task.

05-02-2025 34
Types of Reinforcement Machine Learning:2

Positive Negative
Reinforcement Reinforcement

• Removes an undesirable stimulus to


• Rewards the agent for taking a desired
encourage a desired behavior.
action.
• Discourages the agent from repeating the
• Encourages the agent to repeat the
behavior.
behavior.
• Eg:
• Eg:
1. Turning off a loud buzzer when a lever is
1. Giving a treat to a dog for sitting
pressed
2. Providing a point in a game for a
2. Avoiding a penalty by completing a task.
correct answer.

05-02-2025 35
Advantages of Reinforcement Machine Learning:
1. It has autonomous decision-making that is well-suited for tasks and that can
learn to make a sequence of decisions, like robotics and game-playing.
2. This technique is preferred to achieve long-term results that are very
difficult to achieve.
3. It is used to solve a complex problems that cannot be solved by
conventional techniques.

05-02-2025 36
Disadvantages of Reinforcement Machine Learning:
1. Training Reinforcement Learning agents can be computationally expensive
and time-consuming.
2. Reinforcement learning is not preferable to solving simple problems.
3. It needs a lot of data and a lot of computation, which makes it impractical
and costly.

05-02-2025 37
Applications of Reinforcement Learning:
1. Supply Chain and Inventory Management: RL can be used to optimize
supply chain operations.
2. Energy Management: RL can be used to optimize energy consumption.
3. Game AI: RL can be used to create more intelligent and adaptive video
games.
4. Adaptive Personal Assistants: RL can be used to improve personal
assistants.
5. Virtual Reality (VR) and Augmented Reality (AR): RL can be used to
create
05-02-2025 immersive and interactive experiences. 38
6. Industrial Control: RL can be used to optimize industrial processes.
7. Education: RL can be used to create adaptive learning systems.
8. Agriculture: RL can be used to optimize agricultural operations.
9. Game Playing: RL can teach agents to play games, even complex ones.
10. Robotics: RL can teach robots to perform tasks autonomously.
11. Autonomous Vehicles: RL can help self-driving cars navigate and make
decisions.

05-02-2025 39
CONTD…
12. Recommendation Systems: RL can enhance recommendation algorithms by
learning user preferences.
13. Healthcare: RL can be used to optimize treatment plans and drug discovery.
14. Natural Language Processing (NLP): RL can be used in dialogue systems
and chatbots.
15. Finance and Trading: RL can be used for algorithmic trading.

05-02-2025 40
M/C learning process Terminologies :
 1. Weight Space:
 Weight space refers to the multidimensional space formed by the
parameters or weights of a machine learning model.
 Each point in this space represents a unique configuration of weights,
which collectively define the model's behavior and performance.

05-02-2025 41
2. Curse of Dimensionality: escalates the risk of overfitting.
 The phenomenon where the efficiency and effectiveness of algorithms
deteriorate as the dimensionality of the data increases exponentially.
 In high-dimensional spaces, data points become sparse, making it
challenging to discern meaningful patterns or relationships due to the vast
amount of data required to adequately sample the space.
 Curse of Dimensionality significantly impacts machine learning algorithms
in various ways.
 It leads to increased computational complexity, longer training times, and
higher
05-02-2025
resource requirements. 42
How to Overcome the Curse of Dimensionality?
It escalates the risk of overfitting.
1. Dimensionality Reduction Techniques:
a. Feature Selection
b. Feature Extraction

2. Data Preprocessing:
a. Normalization
b. Handling Missing Values
05-02-2025 43
Feature Selection:
 Identify and select the most relevant features from the original dataset while
discarding irrelevant or redundant ones.
 This reduces the dimensionality of the data, simplifying the model and
improving its efficiency.
Feature Extraction:
 Transform the original high-dimensional data into a lower-dimensional
space by creating new features that capture the essential information.
 Technique such as Principal Component Analysis (PCA) is used.
05-02-2025 44
Normalization:
 Scale the features to a similar range to prevent certain features from
dominating others, especially in distance-based algorithms.
Handling Missing Values:
 Address missing data appropriately through imputation or deletion to ensure
robustness in the model training process.

05-02-2025 45
ML | Underfitting and Overfitting
 A model is said to be a good machine learning model if it generalizes any
new input data from the problem domain in a proper way.
 We can make predictions about future data, that the data model has never
seen.
 Bias and Variance

05-02-2025 46
Bias:
 Bias refers to the error due to overly simplistic assumptions in the learning
algorithm.
 It is the error due to the model’s inability to represent the true relationship
between input and output accurately.
 Bias=high poor performance both on the training and testing
underfitting.

05-02-2025 47
Variance:
 Variance is the error due to the model’s sensitivity to fluctuations in the
training data.
 It’s the variability of the model’s predictions for different instances of
training data.
 Variance = High Model performs well on the training data
Poorly on the testing data
Overfitting.

05-02-2025 48
05-02-2025 50
Underfitting in Machine Learning: High bias and low variance.
 A model is too simple to capture data complexities.
 It represents the inability of the model to learn the training data
effectively.
 Result in poor performance both on the training and testing data.
 An underfit model’s are inaccurate, especially when applied to new,
unseen examples.

05-02-2025 51
Reasons for Underfitting
1.The model is too simple, So it may be not capable to represent the complexities
in the data.
2.The input features which is used to train the model is not the adequate.
3.The size of the training dataset used is not enough.
4.Excessive regularization are used to prevent the overfitting.
5.Features are not scaled.

05-02-2025 52
Techniques to Reduce Underfitting
1. Increase model complexity.
2. Increase the number of features, performing feature engineering
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of training to get
better results.

05-02-2025 53
Overfitting in Machine Learning: High variance and Low bias
 A statistical model is said to be overfitted when the model does not
make accurate predictions on testing data.
 When a model gets trained with so much data, it starts learning from the
noise and inaccurate data entries in our data set.
 And when testing with test data results in High variance. Then the
model does not categorize the data correctly, because of too many
details and noise.
 They build unrealistic models.

05-02-2025 54
Reasons for Overfitting:
1. High variance and low bias.
2.The model is too complex.
3.The size of the training data.
Techniques to Reduce Overfitting
1. Linear algorithm and Decision tree
2. Improving the quality of training data
3. Increase the training data
4. Reduce model complexity.
5. Early stopping during the training phase
05-02-2025 55
Training (60%) vs Testing (20-25%)vs Validation Sets(10-15%):
Training Set :more than 60%
 This is the actual dataset from which a model trains.
 The model sees and learns from this data to predict the outcome or to
make the right decisions.
 Most of the training data is collected from several resources and then
preprocessed and organized to provide proper performance of the model.
 Better the quality and diversity of training data, the better will be the
performance of the model.
 This data is more than 60% of the total data available for the project.
05-02-2025 56
Validation Set:10 to 15 %
 The validation set is used to fine-tune the hyperparameters of the model.
 It is considered a part of the training of the model.
 The model only sees this data for evaluation but does not learn from this data.
 When loss of validation dataset becomes greater than loss of training dataset
(.i.e. reducing bias and variance.) then Validation dataset , can be utilized for
regression as well by interrupting training of model
 This data is approximately 10-15% of the total data available for the project
 Whenever the accuracy of model on validation data is greater than that on
training data then the model is said to have generalized well.
05-02-2025 57
Testing Set:20 to 25%
 Often the validation and testing set combined is used as a testing set which
is not considered a good practice.
 If the accuracy of the model on training data is greater than that on testing
data then the model is said to have overfitting.
 This data is approximately 20-25% of the total data available for the
project.

05-02-2025 58
Performance Measures (or) Classification Metrics:
In a classification task, our main task is to predict the target variable which
is in the form of discrete values.
To evaluate the performance of such a model, we have some metrics as
mentioned below:
1. Accuracy
2. Precision
3. Recall

05-02-2025 59
 Confusion Matrix:
 A confusion matrix is a simple table that shows how well a
classification model is performing by comparing its predictions to the
actual results.
 The matrix displays the number of instances produced by the model on
the test data.
 It breaks down the predictions into four categories:
1. Correct predictions for both classes (true positives and true negatives)
2. Incorrect predictions (false positives and false negatives).
05-02-2025 60
 True Positive (TP): The model correctly predicted a positive outcome
(the actual outcome was positive).
 True Negative (TN): The model correctly predicted a negative outcome
(the actual outcome was negative).
 False Positive (FP): The model incorrectly predicted a positive
outcome (the actual outcome was negative). Also known as a Type I
error.
 False Negative (FN): The model incorrectly predicted a negative
outcome (the actual outcome was positive). Also known as a Type II
error.
05-02-2025 61
Confusion Matrix For binary classification: 2X2 Confusion matrix
recognition having a Dog image or Not Dog image.

Predicted
Dog Not Dog
True Positive False Negative
Dog
(TP =5) (FN =1)
Actual False Positive True Negative
Not Dog
(FP=1) (TN=3)

05-02-2025 62
 True Positive (TP): It is the total counts having both predicted and actual
values are Dog.
 True Negative (TN): It is the total counts having both predicted and
actual values are Not Dog.
 False Positive (FP): It is the total counts having prediction is Dog while
actually Not Dog.
 False Negative (FN): It is the total counts having prediction is Not Dog
while actually, it is Dog.

05-02-2025 63
Index 1 2 3 4 5 6 7 8 9 10

Not Not Not Not


Actual Dog Dog Dog Dog Dog Dog
Dog Dog Dog Dog

Predict Not Not Not Not


Dog Dog Dog Dog Dog Dog
ed Dog Dog Dog Dog

Result TP FN TP TN TP FP TP TP TN TN

05-02-2025 64
TP FN

FP TN

05-02-2025 65
I. Accuracy
The accuracy metric is one of the simplest Classification metrics to
implement, and it can be determined as the number of correct predictions
to the total number of predictions.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝑇𝑃+𝑇𝑁
=
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

(5+3) 8
= = =0.8
(5+3+1+1) 10

05-02-2025 66
II. Precision
 The precision metric is used to overcome the limitation of Accuracy.
 The precision determines the proportion of positive prediction that was
actually correct.
 It can be calculated as the True Positive to the total positive predictions
(True Positive and False Positive).
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
Precision=
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠

𝑇𝑃 5 5
= = = =0.8333
𝑇𝑃+𝐹𝑃 (5+1) 6

05-02-2025 67
III. Recall or Sensitivity
 It is also similar to the Precision metric.
 It aims to calculate the proportion of actual positive that was identified
incorrectly.
 It can be calculated as True Positive or predictions that are actually true
to the total number of positives, either correctly predicted as positive or
incorrectly predicted as negative (true Positive and false negative).
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
Recall=
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

𝑇𝑃 5 5
= = = =0.8333
𝑇𝑃+𝐹𝑁 (5+1) 6

05-02-2025 68
THANK YOU

05-02-2025 69

You might also like