ML U1 & U2 Notes
ML U1 & U2 Notes
ML U1 & U2 Notes
Unit 1
es
● Machinelearningis a process through which computerized
systems use human-supplied data and feedback to independently
ot
make decisions and predictions, typically becoming more accurate
with continual training. This contrasts with traditional computing,
in which every action taken by a computer must be
pre-programmed.
N
● Reinforcementlearningteaches a system as it interacts with an
's
environment by offering it rewards when it performs an action
correctly.
ah
Examples:
es
● Natural Language Processing (NLP) in Customer Service: Cognitive
automation systems that handle customer inquiries use supervised
ot
machine learning models to understand text (via NLP). They learn
from historical chat data to provide responses that mimic a
human customer service agent. These systems can independently
N
refine their responses based on interactions, making them more
efficient over time.
's
● Example: Virtual assistants like chatbots that interpret customer
queries and provide responses, learning from past interactions to
ah
es
Q2) Describe various phases used by machine learning.
ot
N
's
ah
hm
Re
Data preparation is a step where we put our data into a suitable place
es
and prepare it to use in our machine learning training.
This step can be further divided into two processes:
ot
Data exploration:
It is used to understand the nature of data that we have to work with.
We need to understand the characteristics, format, and quality of
N
data. A better understanding of data leads to an effective outcome. In
this, we find Correlations, general trends, and outliers.
's
Data pre-processing:
Now the next step is preprocessing of data for its analysis.
ah
data using various analytical techniques and review the outcome.
es
Hence, in this step, we take the data and use machine learning
algorithms to build the model.
ot
5. Train Model
Now the next step is to train the model, in this step we train our model
N
to improve its performance for better outcome of the problem.
We use datasets to train the model using various machine learning
's
algorithms. Training a model is required so that it can understand the
various patterns, rules, and, features.
ah
then we test the model. In this step, we check for the accuracy of our
model by providing a test dataset to it.
Testing the model determines the percentage accuracy of the model as
Re
7. Deployment
The last step of machine learning life cycle is deployment, where we
deploy the model in the real-world system.
If the above-prepared model is producing an accurate result as per our
requirement with acceptable speed, then we deploy the model in the
real system. But before deploying the project, we will check whether it
is improving its performance using available data or not. The
deployment phase is similar to making the final report for a project.
Q3) Every machine learning algorithm should have some key points
while designing. What are they? Explain them in brief.
es
ot
N
's
ah
hm
● Second important attribute is the degree to which the learner
es
will control the sequences of training examples. For example:
when training data is fed to the machine then at that time
ot
accuracy is very less but when it gains experience while playing
again and again with itself or opponent the machine algorithm will
get feedback and control the chess game accordingly.
N
● Third important attribute is how it will represent the distribution
of examples over which performance will be measured. For
's
example, a Machine learning algorithm will get experience while
going through a number of different cases and different
ah
to the algorithm the machine learning will choose NextMove function
which will describe what type of legal moves should be taken. For
example : While playing chess with the opponent, when opponent will
play then the machine learning algorithm will decide what be the
number of possible legal moves taken in order to get success.
Step 3- Choosing Representation for Target function:When the
machine algorithm will know all the possible legal moves the next step is
to choose the optimized move using any representation i.e. using linear
Equations, Hierarchical Graph Representation, Tabular form etc. The
NextMove function will move the Target move like out of these move
which will provide more success rate. For Example : while playing chess
machine have 4 possible moves, so the machine will choose that
es
optimized move which will provide success to it.
ot
Step 4- Choosing Function Approximation Algorithm:An optimized
move cannot be chosen just with the training data. The training data
had to go through with set of example and through these examples the
N
training data will approximates which steps are chosen and after that
machine will provide feedback on it. For Example : When a training data
's
of Playing chess is fed to algorithm so at that time it is not machine
algorithm will fail or get success and again from that failure or success
ah
it will measure while next move what step should be chosen and what is
its success rate.
hm
relational, or logical operations can be applied to the variable without
es
causing an error. In Machine learning, it is very important to know
appropriate datatypes of independent and dependent variable.
ot
as it provides the basis for selecting classification or regression
models. Incorrect identification of data types leads to incorrect
modeling which in turn leads to an incorrect solution.
N
Here I will be discussing different types of data types with suitable
examples.
's
Different Types of data types
ah
hm
Re
proper meaning. Their values can be counted.
es
E.g.: – No. of cars you have, no. of marbles in containers, students in a
class, etc.
ot
N
's
ah
true meaning. Their values can not be counted but measured. The value
can be infinite
E.g.: – height, weight, time, area, distance, measurement of rainfall,
Re
etc.
es
2. Qualitative data type: –
ot
These are the data types that cannot be expressed in numbers. This
describes categories or groups and is hence known as the categorical
data type.
This can be divided into:-
N
's
a. Structured Data:
This type of data is either number or words. This can take numerical
ah
E.g.) Sunny=1, cloudy=2, windy=3 or binary form data like 0 or1, Good
or bad, etc.
Re
b. Unstructured data:
This type of data does not have the proper format and therefore
known as unstructured data.This comprises textual data, sounds,
images, videos, etc.
es
ot
N
Besides this, there are also other types refer as Data Types
preliminaries or Data Measures:-
's
These can also be refer different scales of measurements.
ah
measurable.
E.g., male or female (gender), race, country, etc.
Re
II. Ordinal Data Type:
This is also a categorical data type like nominal data but has some
natural ordering associated with it.
E.g., Likert rating scale, Shirt sizes, Ranks, Grades, etc.
es
ot
III. Interval Data Type:
This is numeric data which has proper order and the exact zero means
N
the true absence of a value attached. Here zero means not a complete
absence but has some value. This is the local scale.
's
E.g., Temperature measured in degree Celsius, time, Sat score, credit
score, pH, etc. difference between values is familiar. In this case,
ah
es
ot
N
Q6) Define the term dataset. What are the properties of dataset one
's
should consider while choosing dataset.
ah
represent the number of data points and the columns represent the
features of the Dataset. Datasets may vary in size and complexity and
they mostly require cleaning and preprocessing to ensure data quality
Re
ot
models, there are input features and output features. Here:
The input features are Sepal Length, Sepal Width, Petal Length, and
Petal Width.
Species is the output feature.
N
's
Datasets can be stored in multiple formats. The most common ones are
CSV, Excel, JSON, and zip files for large datasets such as image
ah
datasets.
hm
Types of Datasets
Numerical Dataset, Categorical Dataset, Web Dataset, Time series
Dataset, Image Dataset, Ordered Dataset, Partitioned Dataset,
File-Based Datasets, Bivariate Dataset, Multivariate Dataset
Data Interpretation
It means conducting a complete study of the data. Analyzing number of
rows and columns, data types, useful and redundant data, checking for
null values.
Based on this study various operations are done on the data to make it
suitable for entering in ML models. Operations such as Feature
Engineering, Dimension Reduction, Null Values induction, Missing values
es
fill-in, data types conversion by encoding methods, etc.
ot
The choice of dataset can significantly impact the model's
performance, generalization, and the insights it can provide. Here are
some key considerations and steps to guide you in choosing the right
N
dataset for your machine learning project:
's
1. Define Your Problem and Objectives:
Start by clearly defining the problem you want to solve and the
ah
models. Data preprocessing may be required to clean and prepare the
es
dataset.
ot
5. Data Balance:
For classification problems, check the class distribution. An imbalanced
dataset (where one class significantly outnumbers the others) can lead
N
to biased models. Techniques like oversampling or undersampling may
be needed to address class imbalance.
's
6. Data Diversity:
ah
you have the necessary permissions to use the data, and check for any
legal or ethical constraints.
es
10. Domain Knowledge:
- Leverage domain knowledge and expertise in the field related to your
ot
problem. Subject matter experts can guide you in selecting relevant
datasets and understanding the nuances of the data.
experiment with different datasets to find the one that works best
es
for your problem.
ot
Q7) Compare supervised and unsupervised learning
N
Feature Supervised Learning Unsupervised Learning
output). labels).
es
- Support Vector - DBSCAN
Machines (SVM)
- Random Forests
ot
- Naive Bayes
N
Evaluation - Accuracy - Silhouette Score
Metrics - Precision - Adjusted Rand Index
- Recall - Davies-Bouldin Index
's
- F1 Score - Calinski-Harabasz
- Mean Squared Error Score
ah
(MSE)
Presence of Yes, output labels are No, output labels are not
hm
data.
es
tasks automatically
ot
- Time-consuming to label accurate results
data - Harder to interpret
N
- Can struggle with and validate findings
complex or dynamic
environments
's
Q8) Justify the need of data mining in machine learning.
ah
● Before machine learning algorithms can be applied, the raw data
must be preprocessed to handle missing values, outliers, and
irrelevant features.Data mining techniqueslike datacleaning,
es
transformation, and normalization ensure the dataset is of high
quality and suitable for training machine learning models.
ot
● For example, in customer data, irrelevant features such as
unrelated columns or inconsistencies in user details can negatively
affect model accuracy. Data mining helps clean and prepare such
data.
N
3. Feature Selection and Engineering
's
● The performance of machine learning algorithms significantly
ah
can help identify key features like income, credit score, and loan
history, which are most relevant for making predictions.
● A large portion of data, such as text, images, and videos, is
unstructured.Data miningtechniques can be used toextract
structured information from this data, which is essential for
feeding into machine learning models.
● Example: Mining text data to extract useful features such as
sentiment or topic, which can be used in natural language
processing (NLP) tasks like sentiment analysis or recommendation
systems.
es
5. Discovering Hidden Patterns in Unlabeled Data
ot
are used in machine learning to find hidden patterns in data
without labels.Data miningtechniques help in uncoveringthese
patterns, associations, or clusters, which machine learning models
N
can use to improve their decision-making processes.
● Example: In market segmentation, data mining might discover
's
clusters of customers with similar buying habits, which can then
be used for personalized marketing strategies.
ah
patterns in transaction data that machine learning algorithms can
es
use to accurately predict and detect fraud.
ot
● Machine learning is often a part of the broader process called
Knowledge Discovery in Databases (KDD), where datamining
N
plays a pivotal role. Data mining techniques are essential to
discover previously unknown relationships or patterns from large
's
datasets, which can then be leveraged by machine learning models
for prediction and classification.
ah
machine learning.
● With the growing amount of data generated every day (e.g., from
es
IoT devices, social media, or transactions),dataminingis
essential for handling and processing large-scale datasets
ot
efficiently. Machine learning models often need summarized or
reduced data to operate effectively, which data mining provides
through aggregation and summarization techniques.
N
● Example: In social media analysis, mining relevant social posts
from millions of records and summarizing them for machine
's
learning sentiment analysis or trend prediction.
Conclusion:
ah
es
There are numerous antibodies in the blood. The technician or nurse
will collect a sample of your blood and examine it for IgM and IgG. Ig
ot
stands for an immunoglobulin molecule.
● IgM antibodies develop at an early stage of infection against
SARS-CoV-2.
N
● IgG antibodies develop against SARS-CoV-2 once the person has
recovered from coronavirus.
's
Results by Antibody Test (IgG)
ah
The antibody testing kits take around 30-60 minutes to show results.
Results by RT-PCR
RT-PCR is capable of delivering an accurate diagnosis and result for
COVID 19 within 3 hours. The laboratories take 6-8 hours to derive a
conclusive result.
TrueNat
TrueNat is a chip-based, portable RT-PCR machine that was initially
developed to diagnose tuberculosis. You can confirm your sample using
confirmatory tests for SARS-CoV-2 if you test positive by TrueNat
Beta CoV.
Results by TrueNat
es
It is capable of producing faster results than standard RT-PCR tests.
ot
Aside from these, patient demographic and time for test and reporting
will also be recorded and used for detecting underlying patterns.
These underlying patterns can be used in statistics to generate
hypotheses and theories.
N
's
Properties of the data will be the same as Q6.
ah
the target variable in the desired direction.
es
● Regression analysis is heavily based on statistics and hence gives
quite reliable results to this reason only regression models are
ot
used to find the linear as well as non-linear relation between the
independent and the dependent or target variables.
N
Types of Regression are as follows:
● Linear regressionis used for predictive analysis. Linear
's
regression is a linear approach for modeling the relationship
between the criterion or the scalar response and the multiple
ah
θx + b
● Polynomial Regression:This is an extension of linear regression
and is used to model a non-linear relationship between the
Re
● Decision Tree Regression:A Decision Tree is the most powerful
es
and popular tool for classification and prediction. A Decision tree
is a flowchart-like tree structure, where each internal node
ot
denotes a test on an attribute, each branch represents an
outcome of the test, and each leaf node (terminal node) holds a
class label. There is a non-parametric method used to model a
N
decision tree to predict a continuous outcome.
● Random Forestis an ensemble technique capable of performing
's
both regression and classification tasks with the use of multiple
decision trees and a technique called Bootstrap and Aggregation,
ah
given input value.SVR can use both linear and non-linear kernels. A
linear kernel is a simple dot product between two input vectors,
while a non-linear kernel is a more complex function that can
capture more intricate patterns in the data. The choice of kernel
depends on the data’s characteristics and the task’s complexity.
● Ridge Regression: Ridge regression is a technique for analyzing
multiple regression data. When multicollinearity occurs, least
squares estimates are unbiased. This is a regularized linear
regression model, it tries to reduce the model complexity by
adding a penalty term to the cost function. A degree of bias is
added to the regression estimates, and as a result, ridge
regression reduces the standard errors.
es
● Lasso regressionis a regression analysis method that performs
ot
both variable selection and regularization. Lasso regression uses
soft thresholding. Lasso regression selects only a subset of the
N
provided covariates for use in the final model. This is another
regularized linear regression model, it works by adding a penalty
term to the cost function, but it tends to zero out some features’
's
coefficients, which makes it useful for feature selection.
● ElasticNet Regression: Linear Regression suffers from
ah
overfitting and can’t deal with collinear data. When there are
many features in the dataset and even some of them are not
hm
on the new data. So, to deal with these issues, we include both
L-2 and L-1 norm regularization to get the benefits of both Ridge
and Lasso at the same time. The resultant model has better
predictive power than Lasso
● Bayesian Linear Regression: As the name suggests this algorithm
is purely based on Bayes Theorem. Because of this reason only we
do not use the Least Square method to determine the
coefficients of the regression model. So, the technique which is
used here to find the model weights and parameters relies on
features posterior distribution and this provides an extra
stability factor to the regression model which is based on this
technique.
Q11) Problems based on regression
es
Numerical PDF
ot
Q12) Note on polynomial regression
Polynomial Regression
N
Polynomial Regression is a regression algorithm that models the
relationship between a dependent(y) and independent variable(x) as nth
's
degree polynomial. The Polynomial Regression equation is given below:
ah
It is also called the special case of Multiple Linear Regression in ML.
hm
a good result as we have seen in Simple Linear Regression, but if
es
we apply the same model without any modification on a non-linear
dataset, then it will produce a drastic output. Due to which loss
ot
function will increase, the error rate will be high, and accuracy
will be decreased.
● So for such cases, where data points are arranged in a non-linear
N
fashion, we need the Polynomial Regression model. We can
understand it in a better way using the below comparison diagram
's
of the linear dataset and non-linear dataset.
ah
hm
Re
depends on the coefficients, which are arranged in a linear fashion.
es
Equation of the Polynomial Regression Model:
Simple Linear Regression equation:
ot
y = b0+b1x
Polynomial Regression equation:
2 3 𝑛
N
𝑦 = 𝑏𝑜 + 𝑏
1𝑥 + 𝑏1𝑥 + 𝑏2𝑥
+ ... + 𝑏𝑛𝑥
Q13) Identify the domain where you can apply linear regression and
's
polynomial regression
ah
prediction
es
- Predicting fluctuations)
population trends
ot
Marketing - Customer spending - Customer lifetime value with
and prediction non-linear trends
N
Advertising - Pricing models - Advanced pricing models
with curve fitting
supervised and unsupervised learning?
es
Reinforcement Learning (RL)is a branch of machine learning focused on
ot
making decisions to maximize cumulative rewards in a given situation.
Unlike supervised learning, which relies on a training dataset with
N
predefined answers, RL involves learning through experience. In RL, an
agent learns to achieve a goal in an uncertain, potentially complex
environment by performing actions and receiving feedback through
's
rewards or penalties.
ah
based on the state and action.
es
Value Function:A function that estimates the expected cumulative
reward from a given state.
ot
Model of the Environment: A representation of the environment that
helps in planning by predicting future states and rewards.
more easily.
hm
Re
The above image shows the robot, diamond, and fire. The goal of the
robot is to get the reward that is the diamond and avoid the hurdles
that are fired. The robot learns by trying all the possible paths and
then choosing the path which gives him the reward with the least
hurdles. Each right step will give the robot a reward and each wrong
step will subtract the reward of the robot. The total reward will be
calculated when it reaches the final reward that is the diamond.
es
Main points in Reinforcement learning –
ot
● Input: The input should be an initial state from which the model
will start
● Output: There are many possible outputs as there are a variety
N
of solutions to a particular problem
● Training: The training is based upon the input, The model will
's
return a state and the user will decide to reward or punish the
model based on its output.
ah
Objective Predict the correct Discover hidden Maximize the
es
label for new, patterns or cumulative reward by
unseen data. groupings in the learning the best
data. sequence of actions.
ot
Learning The model is The model The agent learns
Process trained by organizes data through trial and
minimizing the
N
based on
structure. sequential).
Type of Explicit feedback in No feedback; the Reward signal or
es
Feedback the form of labeled model penalty after each
data self-discovers action (delayed
(correct/incorrect patterns in the feedback).
ot
labels). data.
Advantages
Regression
- Provides accurate
predictions with
N
Association
- Learns complex
strategies and adapts
's
well-labeled data. which is more to dynamic
- Clear evaluation readily available. environments.
ah
es
Q15) Compare polynomial and linear regression
ot
Feature Linear Regression Polynomial Regression
N
Definition Models the relationship Models the relationship
between the dependent between the dependent and
and independent independent variables as a
's
variables as a straight polynomial curve.
line.
ah
es
Applications - Stock price prediction - Trajectory modeling
- Sales forecasting - Disease progression
ot
- House price prediction - Climate change modeling
- Medical cost - Crop yield prediction
N
prediction
approximation is
sufficient.
Re
es
Unit 2
ot
Q1) Define the term classification in machine learning by providing
three real life examples
N
Classification:A classification problem is when theoutput variable is a
's
category, such as “Red” or “blue” , “disease” or “no disease”.
Classification is a type of supervised learning that is used to predict
ah
1. Since Supervised Learning works with a data set, so we can have an
exact idea about the classes of objects.
2. These algorithms are useful or helpful in predicting the output
based on the prior experience.
es
Disadvantages of Supervised Learning:
ot
1. These algorithms are not able to solve complex problems.
2. It may predict the wrong output if the test data is different from
the training data.
N
3. It requires lot of computational time to train the algorithm.
's
Applications of Supervised Learning:
"not spam." The model learns patterns from the email content,
sender information, and other metadata to classify incoming
emails.
● Classification Type: Binary classification (Spam/NotSpam).
sugar levels, age, weight, and more.
es
● Classification Type: Multi-class classification (e.g.,Disease A,
Disease B, or No Disease).
ot
3. Credit Card Fraud Detection
N
fraudulent credit card transactions.
● How it Works: A classification model is trained on past
's
transaction data, where transactions are labeled as either
"fraudulent" or "legitimate." The model learns patterns and can
ah
Output Discrete class labels Groupings or clusters of
es
(e.g., "spam" or "not data points (e.g., cluster 1,
spam"). cluster 2).
ot
Common - Logistic Regression - K-Means Clustering
Algorithms - Decision Trees - Hierarchical Clustering
- Support Vector
Machines (SVMs) N - DBSCAN
's
Evaluation - Accuracy - Silhouette Score
Metrics - Precision - Davies-Bouldin Index
ah
(disease/no disease)
Data Requires labeled data No need for labeled data; it
Dependency for training and groups data based on
classification. similarities.
Output Predicts a class or Assigns data points to
Interpretati category based on clusters based on their
on learned patterns. relative distance or
similarity.
es
classification). vary.
ot
Use - Medical diagnosis - Social network analysis
- Sentiment analysis - Anomaly detection
Advantages
results
- Can handle complex
N
- Clear, interpretable - Does not require labeled
data
- Can reveal hidden patterns
's
labeled data in the data
ah
takes the average to improve the predictive accuracy of that dataset."
es
Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of
ot
predictions, and it predicts the final output.
N
and prevents the problem of overfitting.
's
The below diagram explains the working of the Random Forest
algorithm:
ah
hm
Re
There are two assumptions for a better Random forest classifier:
1. There should be some actual values in the feature variable of the
dataset so that the classifier can predict accurate results rather
than a guessed result.
2. The predictions from each tree must have very low correlations.
Why use Random Forest?
es
● It takes less training time as compared to other algorithms.
● It predicts output with high accuracy, even for the large dataset
ot
it runs efficiently.
● It can also maintain accuracy when a large proportion of data is
missing.
N
How does Random Forest algorithm work?
's
Random Forest works in two-phase first is to create the random forest
by combining N decision tree, and second is to make predictions for
ah
Step-2: Build the decision trees associated with the selected data
points (Subsets).
Re
Step-3: Choose the number N for decision trees that you want to build.
dataset is divided into subsets and given to each decision tree. During
es
the training phase, each decision tree produces a prediction result, and
when a new data point occurs, then based on the majority of results,
ot
the Random Forest classifier predicts the final decision. Consider the
below image:
N
's
ah
hm
Re
Applications of Random Forest
● Banking: Banking sector mostly uses this algorithm for the
identification of loan risk.
● Medicine: With the help of this algorithm, disease trends and
risks of the disease can be identified.
● Land Use: We can identify the areas of similar land use by this
es
algorithm.
● Marketing: Marketing trends can be identified using this
ot
algorithm.
N
● Random Forest is capable of performing both Classification and
Regression tasks.
's
● It is capable of handling large datasets with high dimensionality.
● It enhances the accuracy of the model and prevents the
ah
overfitting issue.
● Although random forest can be used for both classification and
regression tasks, it is not more suitable for Regression tasks.
Re
features, identifying high-risk cardiovascular patients, or
es
predicting whether a person is prone to certain genetic diseases.
ot
● Application: Random Forest is widely used to detect fraudulent
transactions in real-time within financial institutions, such as
banks or credit card companies.
N
● How it Works: The algorithm analyzes past transactiondata
's
labeled as "fraudulent" or "legitimate" and learns patterns that
indicate fraud. It can then flag suspicious transactions for
ah
further investigation.
● Example: Identifying credit card fraud by analyzingtransaction
behaviors (e.g., location, transaction time, amount), or predicting
hm
es
● Application: Random Forest can be used in e-commerceplatforms
to provide personalized product recommendations.
ot
● How it Works: The algorithm analyzes past purchase behavior,
browsing history, and customer preferences to suggest relevant
products.
N
● Example: Amazon’s recommendation engine uses RandomForest
models to suggest products that customers might want to buy
's
based on past behavior and similar user profiles.
● Example: Identifying diseased crops from image data,predicting
es
wheat yield based on climate and soil data.
ot
● Application: Random Forest is employed in sentiment analysis to
classify text (such as reviews, social media posts) into categories
N
like positive, negative, or neutral sentiment.
● How it Works: The algorithm analyzes word frequencies,
's
sentence structures, and other textual features to classify text
into various sentiment categories.
ah
amounts of climate data.
es
● How it Works: The algorithm processes historical weatherdata,
temperature records, CO2 levels, and other environmental
ot
factors to predict future climate scenarios.
● Example: Predicting temperature rise, rainfall patterns, or CO2
levels in the atmosphere for the next decade based on historical
data.
N
10. Manufacturing: Quality Control and Fault Detection
's
● Application: Random Forest is used to improve productquality and
ah
unknown genes.
es
● Example: Classifying tumor vs. non-tumor genes basedon
expression data, identifying genetic markers associated with
ot
hereditary diseases.
N
● Application: Random Forest is used in image recognition to
classify and detect objects in images.
's
● How it Works: The algorithm processes pixel values,colors,
shapes, and textures from images to classify objects or detect
ah
specific patterns.
● Example: Recognizing objects like cars, animals, orfaces in
images, or classifying handwritten digits for automated data
hm
entry.
1. You need a test dataset or a validation dataset with expected
es
outcome values.
2. Make a prediction for each row in your test dataset.
ot
3. From the expected outcomes and predictions count: The number of
correct predictions for each class.
The number of incorrect predictions for each class, organized by the
class that was predicted.
N
4. These numbers are then organized into a table, or a matrix as
's
follows:
ah
hm
predicted class.
Predicted across the top: Each column of the matrix corresponds to
an actual class.
Confusion Matrix
True Positive (TP)
The predicted value matches the actual value
es
The actual value was positive and the model predicted a positive value
True Negative (TN)
ot
The predicted value matches the actual value
The actual value was negative and the model predicted a negative value
False Positive (FP)
N
The predicted value was falsely predicted
The actual value was negative but the model predicted a positive value
's
False Negative (FN)
The predicted value was falsely predicted
ah
The actual value was positive but the model predicted a negative value
• It not only tells the error made by the classifiers but also the type
of errors such as it is either type-l or type-ll error.
• With the help of the confusion matrix, we can calculate the different
parameters for the model, such as accuracy, precision, etc.
Example:
Expected Predicted
man woman
man man
woman woman
man man
es
women man
women women
ot
women women
man man
man
N
women
's
women women
ah
man woman
The total actual men in the dataset is the sum of the values on the men
column (3+2)
The total actual women in the dataset is the sum of values in the
women column (1 +4).
The correct values are organized in a diagonal line from top left to
bottom-right of the matrix (3+4).
More errors were made by predicting men as women than predicting
women as men
es
ot
True Positive:
Interpretation: You predicted positive and it's true. You predicted
N
that a woman is pregnant and she actually is.
True Negative:
's
Interpretation: You predicted negative and it's true. You predicted
that a man is not pregnant and he actually is not
ah
False Positive:
Interpretation: You predicted positive and it's false. You predicted
that a man is pregnant but he actually is not.
hm
False Negative:
Interpretation: You predicted negative and it's false. You predicted
that a woman is not pregnant but she actually is.
Re
Q6) Explain the concept of type 1 and type 2 errors by giving suitable
examples.
Confusion Matrix
True Positive (TP)
The predicted value matches the actual value
es
The actual value was positive and the model predicted a positive value
True Negative (TN)
ot
The predicted value matches the actual value
The actual value was negative and the model predicted a negative value
False Positive (FP)
N
The predicted value was falsely predicted
The actual value was negative but the model predicted a positive value
's
False Negative (FN)
The predicted value was falsely predicted
ah
The actual value was positive but the model predicted a negative value
Scenario 1:We don't have a kitten among the group. Yet, ML algo
predicts it is there. If we accept the ML algo prediction then it is Type
1 error also known as 'False Positive'
Re
Scenario 2:We have a kitten among the group. Yet, ML algo predicts it
is not there. If we accept the ML algo prediction then it is Type 2
error also known as 'False Negative'.
Type II error: Predicting that a model is not correct when it is
es
correct.
Scenario/Problem Statement 3:Medical trials for a drug which is a
ot
cure for Cancer
Type I error: Predicting that a cure is found when it is not the case.
Type II error: Predicting that a cure is not found when in fact it is the
case.
N
's
Q7) Discuss following by giving suitable examples.
● Overfitting ● Underfitting
ah
data. It mainly happens when we uses very simple model with overly
simplified assumptions. To address underfitting problem of the model,
we need to use more complex models, with enhanced feature
representation, and less regularization.
Note: The underfitting model has High bias and low variance.
Reasons for Underfitting
● The model is too simple, So it may be not capable to represent
the complexities in the data.
● The input features which is used to train the model is not the
adequate representations of underlying factors influencing the
target variable.
● The size of the training dataset used is not enough.
es
● Excessive regularization are used to prevent the overfitting,
which constraint the model to capture the data well.
ot
● Features are not scaled.
with so much data, it starts learning from the noise and inaccurate
data entries in our data set. And when testing with test data results in
High variance. Then the model does not categorize the data correctly,
because of too many details and noise. The causes of overfitting are
the non-parametric and non-linear methods because these types of
machine learning algorithms have more freedom in building the model
based on the dataset and therefore they can really build unrealistic
models. A solution to avoid overfitting is using a linear algorithm if we
have linear data or using the parameters like the maximal depth if we
are using decision trees.
es
Reasons for Overfitting:
● High variance and low bias.
ot
● The model is too complex.
● The size of the training data.
● Increase the training data can improve the model’s ability to
generalize to unseen data and reduce the likelihood of
overfitting.
hm
stop training).
● Ridge Regularization and Lasso Regularization.
● Use dropout for neural networks to tackle overfitting.
es
ot
N
's
ah
We can understand the term entropy with any simple example: flipping
a coin. When we flip a coin, then there can be two outcomes. However,
it is difficult to conclude what would be the exact outcome while
flipping a coin because there is no direct relation between flipping a
coin and its outcomes. There is a 50% probability of both outcomes;
then, in such scenarios, entropy would be high. This is the essence of
entropy in machine learning.
Consider a data set having a total number of N classes, then the
es
entropy (E) can be determined with the formula below:
ot
Where;
Pi = Probability of randomly selecting an example in class I;
N
Entropy always lies between 0 and 1, however depending on the number
of classes in the dataset, it can be greater than 1. But the high value
's
of
ah
Where;
Pr = Probability of choosing red fruits;
Pg = Probability of choosing green fruits and;
Py = Probability of choosing yellow fruits.
Pr = 2/8 =1/4 [As only 2 out of 8 datasets represents red fruits]
Pg = 2/8 =1/4 [As only 2 out of 8 datasets represents green fruits]
Py = 4/8 = 1/2 [As only 4 out of 8 datasets represents yellow fruits]
Now our final equation will be such as;
es
ot
So, entropy will be 1.5.
N
's
Let's consider a case when all observations belong to the same class;
ah
E=−(1log21)
= 0
Re
ot
reduction in the entropy.
formula:
N
Mathematically, information gain can be expressed with the below
Let's say we have a tree with a total of four values at the root node
that is split into the first level having one value in one branch (say,
Branch 1) and three values in the other branch (Branch 2). The entropy
at the root node is 1.
Now, to compute the entropy at the child node 1, the weights are taken
as ? for Branch 1 and ? for Branch 2 and are calculated using Shannon's
entropy formula. As we had seen above, the entropy for child node 2 is
zero because there is only one value in that child node, meaning there
es
is no uncertainty, and hence, the heterogeneity is not present.
ot
H(X) = - [(1/3 * log2 (1/3)) + (2/3 * log2 (2/3))] = 0.9184
The information gain for the above case is the reduction in the
weighted average of the entropy.
N
's
Information Gain = 1 - ( ¾ * 0.9184) - (¼ *0) = 0.3112
ah
The more the entropy is removed, the greater the information gain.
The higher the information gain, the better the split.
hm
features.
es
● Such as if the fruit is identified on the bases of color, shape, and
taste, then red, spherical, and sweet fruit is recognized as an
ot
apple. Hence each feature individually contributes to identify
that it is an apple without depending on each other.
● Bayes: It is called Bayes because it depends on the principle of
Bayes' Theorem.
N
● Bayes' Theorem: Bayes' theorem is also known as Bayes' Rule or
's
Bayes' law, which is used to determine the probability of a
hypothesis with prior knowledge. It depends on the conditional
ah
probability.
● The formula for Bayes' theorem is given as:
hm
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the
Re
● It is the most popular choice for text classification problems.
es
Disadvantages of Naïve Bayes Classifier:
ot
● Naive Bayes assumes that all features are independent or
unrelated, so it cannot learn the relationship between features.
N
Applications of Naïve Bayes Classifier:
● It is used for Credit Scoring.
's
● It is used in medical data classification.
● It can be used in real-time predictions because Naïve Bayes
ah
classification tasks.
es
Q10) Describe advantages and applications of naïve bayes
ot
classification
N
1. Simplicity and Ease of Implementation:
○ Naïve Bayes is simple and easy to implement. It assumes
's
independence between the features, which reduces
complexity, making it suitable for quick applications with
ah
classification, Naïve Bayes handles multiple classes very
es
well. This makes it ideal for tasks with more than two
outcomes.
ot
6. Robust to Irrelevant Features:
○ Naïve Bayes is relatively immune to irrelevant features in
the data. Even if the assumption of independence between
N
features is violated, it can still perform well in many
practical applications.
's
7. Performs Well with Text Data and Natural Language
Processing (NLP):
ah
classification tasks.
8. Handles Missing Data:
○ Naïve Bayes can handle missing data relatively well. While
Re
labeled emails (spam and not spam). It calculates the
es
likelihood of an email being spam based on the presence or
absence of certain keywords and features.
ot
2. Sentiment Analysis:
○ Application: Naïve Bayes is used to classify customer
reviews, social media posts, or feedback into categories like
N
positive, negative, or neutral sentiment.
○ How it Works: By analyzing the frequency of positiveor
's
negative words in a dataset of labeled text (reviews or
tweets), Naïve Bayes can predict the sentiment of new text
ah
data.
○ Example: It’s widely used in e-commerce platforms to
analyze customer reviews and gauge the overall sentiment
hm
towards products.
3. Document Classification:
○ Application: Naïve Bayes is widely used in text classification
Re
information to predict whether new patients might have a
es
particular disease based on the likelihood of specific
symptoms.
ot
○ Example: Predicting the likelihood of a patient having a
disease like diabetes or heart disease based on input
features like age, weight, blood sugar levels, etc.
5. Recommendation Systems:
N
○ Application: Naïve Bayes is applied in recommendation
's
engines to suggest items such as movies, books, or products
to users based on their preferences.
ah
○ Application: Naïve Bayes can be used in facial recognition
es
systems to classify faces in images or videos.
○ How it Works: The algorithm analyzes facial features,such
ot
as distance between eyes, shape of the nose, etc., and
matches them to pre-classified images in the database.
○ Example: Used in security systems to recognize andverify
individuals' identities.
8. Anomaly Detection: N
's
○ Application: Naïve Bayes is applied in cybersecurityto
detect unusual patterns or anomalies, such as fraud or
ah
network intrusions.
○ How it Works: It learns the normal behavior from historical
data and flags any outliers or anomalies as potential threats.
hm
es
Numerical PDF
ot
Q13) Explain the working of SVM
N
SVM is a method for classification of both linear and non-linear data.
's
Linearly Separable Data:
If the given data is classified into distinct classes such that
ah
If the given data is classified into distinct classes such that
they cannot be separated by a decision boundary, it is called
Non-linearly Separable Data. Since it cannot be separated by a
Re
The goal of the SVM algorithm is to create the best line or
decision boundary that can segregate ‘n’ dimensional space into
classes so that we can easily put the new data points in the
correct category in the future.
es
These extreme cases are called Support Vectors and hence the
algorithm is termed as Support Vector Machine(SVM).
ot
The line formed by joining the points closest to the hyperplane
is the Margin.
N
Margin is the distance between the support vectors and the
hyperplane.
's
TERMINOLOGIES:
𝑊𝑥 + 𝑏
= 0
Re
where,
W = weight vector
b = bias
We can write the equation for the two classes:
𝑊𝑥 + 𝑏
≥1
𝑓
𝑜𝑟𝑦
𝑖 = 1
𝑊𝑥 + 𝑏
≥1
𝑓
𝑜𝑟𝑦
𝑖 = − 1
𝑦(𝑊𝑥 + 𝑏
) = 1
2.These are the closest data points to the hyperplane which
es
plays a critical role in deciding the hyperplane and margin.
3.Margins are of two types : Hard margin & Soft Margin
ot
N
's
ah
hm
es
ot
Types of SVM:
Disadvantages:
Re
● How it Works: The algorithm can effectively classifyimages by
es
finding the optimal hyperplane that separates different classes in
the feature space derived from the image data.
ot
● Example: Recognizing handwritten digits in the MNIST dataset
or classifying images of cats and dogs.
N
● Application: SVM is employed for text categorizationtasks, such
's
as spam detection, sentiment analysis, and document
classification.
ah
3. Bioinformatics
4. Finance
● How it Works: It analyzes historical financial datato classify
es
transactions as fraudulent or legitimate, or to predict whether a
stock price will rise or fall.
ot
● Example: Classifying credit applicants into "approved"or "denied"
categories based on their financial history.
N
● Application: SVM is used to assist in diagnosing diseasesbased on
's
patient data, such as symptoms and medical history.
● How it Works: By analyzing various features relatedto patient
ah
purchasing behavior and preferences.
es
● How it Works: By analyzing customer data, SVM identifies
distinct groups, allowing marketers to target specific segments
ot
with tailored campaigns.
● Example: Classifying customers as "high value," "lowvalue," or "at
risk" based on their purchasing history.
network security.
● How it Works: The algorithm can identify unusual patternsor
outliers in data, classifying them as anomalies that may require
hm
further investigation.
● Example: Detecting fraudulent transactions in creditcard
processing or identifying potential intrusions in network traffic.
Re
es
● Application: SVM can be utilized for forecasting timeseries data
in fields like economics, weather prediction, and stock market
ot
analysis.
● How it Works: By analyzing historical data trends,SVM can
predict future values in a time series dataset.
N
● Example: Predicting future stock prices based on historical
trends or forecasting weather conditions based on past climate
's
data.
11. Robotics
ah
and alternate hypothesis.
es
What is Hypothesis Testing?
ot
Hypothesis testing is a statistical method that is used in making
statistical decisions using experimental data.
population parameter.
Ex:
N
Hypothesis Testing is basically an assumption that we make about the
's
1) you say an average student in class is 40 or a boy is taller than girls.
2) Some scientists claim that ultraviolet (UV) light can damage the
ah
es
ot
N
's
ah
hm
Need of Hypothesis
Hypothesis testing is an essential procedure in statistics.
A hypothesis test evaluates two mutually exclusive statements about a
Re
statement in which hypothesis is a
es
there is no relation statement in which
between the two there is some
ot
Variables. statistical
relationship between
the two variables.
What it is?
N
enerally,
G
researchers try to
esearchers try to
R
accept or
's
reject or disprove it prove it.
rejected.
Q16) Write a short note on Multivariate Regression
es
Multivariate Regression
ot
Multivariate regression is a statistical technique that uses a
mathematical model to estimate the relationship between a dependent
variable and multiple independent variables
N
It's an extension of linear regression, which only involves one response
's
variable. Multivariate regression can be used in a variety of
applications, including: Identifying risk factors for an outcome,
ah
one dependent variable (responses), are linearly related. The method is
broadly used to predict the behavior of the response variables
associated to changes in the predictor variables, once a desired degree
of relation has been established.
Here are some examples of how multivariate regression can be used:
es
● Pesticide concentration in surface water
A multivariate regression model can estimate the relationship between
ot
river flow and seasonal pesticide use, and how these factors affect
pesticide concentration in surface water.
● Intracranial bleeding
N
A multivariate logistic regression analysis can identify the strongest
predictors of intracranial bleeding, such as vomiting/nausea and
's
seizures.
● Multiple genetic variants and neuroimaging phenotypes
ah
2. Improves Model Performance:
es
○ Selecting the most relevant features can enhance the
model’s accuracy and predictive power. It allows the model
ot
to focus on the most informative data points, which can lead
to better generalization to unseen data.
3. Enhances Interpretability:
N
○ A model with fewer features is often easier to interpret
and understand. This is especially important in fields such as
's
healthcare and finance, where stakeholders need to
understand the factors driving predictions.
ah
○ Different machine learning algorithms may require
es
different features for optimal performance. Feature
selection can help identify the most relevant features for
ot
each algorithm, aiding in model comparison and selection.
to variations in the data, making them more robust in the
es
presence of noise or outliers.
ot
Feature selection can be performed using various methods, including:
N
● Filter Methods: Evaluate features based on statistical measures
(e.g., correlation, Chi-square test) to select relevant features
independent of the learning algorithm.
's
ah
es
Common Mistakes Machine Learning
ot
5 Common Machine Learning Errors
● Lack of understanding the mathematical aspect of machine
learning algorithms
N
● Data Preparation and Sampling
○ Data Cleansing
's
○ Feature Engineering
○ Sampling
ah
References
Re