ML Notes
ML Notes
MACHINE LEARNING
Machine learning is a subset of artificial intelligence (AI) that involves the development of
algorithms and statistical models that enable computers to perform tasks without explicit
instructions. Instead of being programmed with specific rules for every possible scenario,
machine learning algorithms learn patterns from data and make decisions based on that learning.
1. Data: The raw information from which the machine learning models learn. This can
include text, images, audio, and other types of data.
2. Algorithms: The mathematical methods and processes used to find patterns in the data.
Common algorithms include decision trees, neural networks, and support vector machines.
3. Models: The output of the learning process, which can make predictions or decisions based
on new data.
4. Training: The process of feeding data into the machine learning algorithm to create a
model. This typically involves splitting data into training and testing sets to evaluate the
model's performance.
5. Features: The individual measurable properties or characteristics of the data being used in
the model. Feature engineering involves selecting, modifying, or creating features to
improve the model's performance.
6. Evaluation: Assessing the performance of the machine learning model using metrics like
accuracy, precision, recall, and F1 score.
1. Supervised Learning: The algorithm is trained on labeled data, where the input data and
the corresponding correct output are provided. Examples include classification and
regression tasks.
2. Unsupervised Learning: The algorithm is trained on unlabeled data and tries to identify
patterns or structures within the data. Examples include clustering and association tasks.
3. Semi-supervised Learning: A combination of supervised and unsupervised learning,
where the algorithm is trained on a small amount of labeled data and a large amount of
unlabeled data.
4. Reinforcement Learning: The algorithm learns by interacting with an environment and
receiving feedback in the form of rewards or penalties. This approach is often used in
robotics, game playing, and other decision-making tasks.
Machine learning is widely used in various applications, including image and speech recognition,
natural language processing, recommendation systems, autonomous vehicles, and healthcare
diagnostics.
Machine learning can be categorized into several types based on the nature of the learning process
and the type of data used. The primary types of learning in machine learning are:
1. Supervised Learning:
o Definition: The algorithm is trained on labeled data, meaning the input data is paired
with the correct output.
o Examples:
▪ Classification: Predicting a categorical label (e.g., spam detection in emails,
image recognition).
▪ Regression: Predicting a continuous value (e.g., predicting house prices,
stock prices).
2. Unsupervised Learning:
o Definition: The algorithm is trained on unlabeled data and tries to find patterns or
structures within the data.
o Examples:
▪ Clustering: Grouping similar data points together (e.g., customer
segmentation, image compression).
▪ Association: Finding rules that describe large portions of the data (e.g.,
market basket analysis).
3. Semi-supervised Learning:
o Definition: Combines a small amount of labeled data with a large amount of
unlabeled data during training. It is useful when labeling data is expensive or time-
consuming.
o Example: Text classification with a small set of labeled documents and a large
corpus of unlabeled documents.
4. Reinforcement Learning:
o Definition: The algorithm learns by interacting with an environment and receiving
feedback in the form of rewards or penalties. The goal is to learn a strategy (policy)
that maximizes cumulative reward.
o Examples:
▪ Game playing: Algorithms learning to play games like chess or Go.
▪ Robotics: Robots learning to navigate or manipulate objects.
These types of learning provide a broad framework for understanding how machine learning
algorithms can be applied to different problems and datasets.
In machine learning, the concepts of hypothesis space and inductive bias are fundamental to
understanding how models learn from data and generalize to new, unseen examples. It can be
given by:
Hypothesis Space
The hypothesis space refers to the set of all possible hypotheses (or models) that a learning
algorithm can choose from to make predictions based on the data. Each hypothesis in this space
represents a different way of mapping inputs to outputs. The size and nature of the hypothesis
space are determined by the choice of the learning algorithm and the model parameters.
For example:
• In linear regression, the hypothesis space consists of all possible linear functions.
• In decision trees, the hypothesis space consists of all possible trees that can be constructed
based on the features and splits.
The hypothesis space can be finite or infinite, and its complexity can greatly affect the learning
process and the model's ability to generalize.
Inductive Bias
Inductive bias refers to the set of assumptions that a learning algorithm makes to predict outputs
for new inputs that it has not encountered before. Since there are usually many hypotheses
consistent with the training data, inductive bias helps the algorithm choose the most appropriate
one from the hypothesis space.
• Preference Bias (Search Bias): The algorithm prefers some hypotheses over others based
on criteria such as simplicity (Occam's razor), regularization, or prior knowledge. For
example, linear models prefer linear relationships even if more complex models might fit
the training data better.
• Restriction Bias: The algorithm is restricted to a subset of the hypothesis space, limiting
the types of models it can choose. For instance, a linear regression algorithm can only
choose linear models, even if the true relationship is nonlinear.
Inductive bias is essential because it allows machine learning algorithms to generalize from
limited data. Without inductive bias, the hypothesis space would be too large, and the algorithm
might overfit the training data without being able to generalize well to new data.
• The hypothesis space defines the boundaries within which the learning algorithm searches
for the best model.
• Inductive bias guides the search process within the hypothesis space, helping the algorithm
choose the most appropriate model for the given problem.
A balance between a sufficiently rich hypothesis space and a well-chosen inductive bias is crucial
for effective learning. Too narrow a hypothesis space may lead to underfitting, where the model
is too simple to capture the underlying data patterns. Conversely, too broad a hypothesis space
without a strong inductive bias may lead to overfitting, where the model captures noise rather
than the underlying data distribution.
Hypothesis space consists of all possible linear functions that can be used to predict the price
based on the size. Mathematically, this is represented as:
Example:
Dataset is:
800 150,000
1000 180,000
1200 210,000
1400 240,000
We need to find the best linear function that fits this data.
Interpretation:
This means that for every additional square foot, the price increases by $150, starting from a base
price of $50,000.
Hypothesis Space:
• Finite vs. Infinite: In linear regression, the hypothesis space is infinite because there are
infinitely many possible values for theta0 and theta1.
• Bounded by Assumptions: The hypothesis space is restricted to linear functions.
Nonlinear relationships cannot be captured by this space.
8
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Machine learning is a buzzword for today's technology, and it is growing very rapidly
day by day. We are using machine learning in our daily life even without knowing it
such as Google
Maps, Google assistant, Alexa, etc. Below are some most trending real-world
applications of Machine Learning:
1. Image Recognition:
It is based on the Facebook project named "Deep Face," which is responsible for
face recognition and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are widely used by various applications of speech
recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the performance.
4. Product recommendations:
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
5. Self-driving cars:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
We have various virtual personal assistants such as Google assistant, Alexa, Cortana,
Siri. As the name suggests, they help us in finding the information using our voice
instruction. These assistants can help us in various ways just by our voice instructions
such as Play music, call someone, Open an email, Scheduling an appointment, etc.
These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways
that a fraudulent transaction can take place such as fake accounts, fake ids, and steal
money in the middle of a transaction. So to detect this, Feed Forward Neural
network helps us by checking whether it is a genuine transaction or a fraud
transaction.
For each genuine transaction, the output is converted into some hash values, and
these values become the input for the next round. For each genuine transaction, there is
a specific pattern which gets change for the fraud transaction hence, it detects it and
makes our online transactions more secure.
Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short
term memory neural network is used for the prediction of stock market trends.
In medical science, machine learning is used for diseases diagnoses. With this,
medical technology is growing very fast and able to build 3D models that can predict
the exact position of lesions in the brain.
Nowadays, if we visit a new place and we are not aware of the language then it is not a problem
at all, as for this also machine learning helps us by converting the text into our known languages.
Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural
Machine Learning that translates the text into our familiar language, and it called as automatic
translation.
• The hypothesis is a common term in Machine Learning and data science projects.
• As we know, machine learning is one of the most powerful technologies
across the world, which helps us to predict results based on past experiences.
• Moreover, data scientists and ML professionals conduct experiments that aim to
solve a problem.
• These ML professionals and data scientists make an initial assumption for
the solution of the problem.
• This assumption in Machine learning is known as Hypothesis.
• In Machine Learning, at various times, Hypothesis and Model are
used interchangeably.
• However, a Hypothesis is an assumption made by scientists, whereas a
model is a mathematical representation that is used to test the hypothesis.
What is Hypothesis?
Example: Let's understand the hypothesis with a common example. Some scientist
claims that ultraviolet (UV) light can damage the eyes then it may also cause
blindness.
In this example, a scientist just claims that UV rays are harmful to the eyes, but we
assume they may cause blindness. However, it may or may not be possible. Hence,
these types of assumptions are called a hypothesis.
The hypothesis is one of the commonly used concepts of statistics in Machine Learning.
It is specifically used in Supervised Machine learning, where an ML model learns a
function that best maps the input to corresponding outputs with the help of an available
dataset.
There are some common methods given to find out the possible hypothesis from the
Hypothesis space, where hypothesis space is represented by uppercase-h (H) and
hypothesis by lowercase-h (h). Th ese are defined as follows:
It is often constrained by choice of the framing of the problem, the choice of model,
and the choice of model configuration.
Hypothesis (h):
It is defined as the approximate function that best describes the target in supervised
machine learning algorithms. It is primarily based on data as well as bias and
restrictions applied to data.
Hence hypothesis (h) can be concluded as a single hypothesis that maps input to
proper output and can be evaluated as well as used to make predictions.
Y: Range
m: Slope of the line which divided test data or changes in y divided by change
in x. x: domain
c: intercept (constant)
Example: Let's understand the hypothesis (h) and hypothesis space (H) with a
two- dimensional coordinate plane showing the distribution of data as follows:
Now, assume we have some test data by which ML algorithms predict the outputs for
input as follows:
If we divide this coordinate plane in such as way that it can help you to predict
output or result as follows:
However, based on data, algorithm, and constraints, this coordinate plane can also be
divided in the following ways as follows:
Hypothesis space (H) is the composition of all legal best possible ways to
divide the coordinate plane so that it best maps input to proper output.
Further, each individual best possible way is called a hypothesis (h). Hence, the
hypothesis and hypothesis space would be like this:
4.2 Hypothesis in Statistics
Significance level
• The significance level is the primary thing that must be set before
starting an experiment.
• It is useful to define the tolerance of error and the level at which effect
can be considered significantly.
• During the testing process in an experiment, a 95% significance level is
accepted, and the remaining 5% can be neglected.
• The significance level also tells the critical or threshold value. For e.g.,
in an experiment, if the significance level is set to 98%, then the critical value
is 0.02%.
5. Inductive Bias:
Definition
Every machine learning model requires some type of architecture design and
possibly some initial assumptions about the data we want to analyze. Generally,
every building block and every belief that we make about the data is a form of
inductive bias.
• Inductive biases play an important role in the ability of machine learning
models to generalize to the unseen data.
• A strong inductive bias can lead our model to converge to the global optimum.
• On the other hand, a weak inductive bias can cause the model to find only the
local optima and be greatly affected by random changes in the initial states.
We can categorize inductive biases into two different groups called relational and
non- relational. The former represents the relationship between entities in the
network, while the latter is a set of techniques that further constrain the learning
algorithm.
Briefly explain the two main problems that degrade the performance of
machine learning models?
Overfitting
• Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset.
• Because of this, the model starts caching noise and inaccurate values present
in the dataset, and all these factors reduce the efficiency and accuracy of the
model.
• The overfitted model has low bias and high variance.
• The chances of occurrence of overfitting increase as much we provide training
to our model.
• It means the more we train our model, the more chances of occurring the
overfitted model.
Example: The concept of the overfitting can be understood by the below graph of the
linear regression output:
As we can see from the above graph, the model tries to cover all the data points present
in the scatter plot. It may look efficient, but in reality, it is not so. Because the goal
of the regression model to find the best fit line, but here we have not got any best fit,
so, it will generate the prediction errors.
Both overfitting and underfitting cause the degraded performance of the machine
learning model. But the main cause is overfitting, so there are some ways by which we
can reduce the occurrence of overfitting in our model.
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
Underfitting
• Underfitting occurs when our machine learning model is not able to capture
the underlying trend of the data.
• To avoid the overfitting in the model, the fed of training data can be stopped
at an early stage, due to which the model may not learn enough from the
training data.
• As a result, it may fail to find the best fit of the dominant trend in the data.
• In the case of underfitting, the model is not able to learn enough from the
training data, and hence it reduces the accuracy and produces unreliable
predictions.
Example: We can understand the underfitting using below output of the linear
regression model:
As we can see from the above diagram, the model is unable to capture the data points
present in the plot.
Goodness of Fit
• The "Goodness of fit" term is taken from the statistics, and the goal of the
machine learning models to achieve the goodness of fit.
• In statistics modeling, it defines how closely the result or predicted values
match the true values of the dataset.
• The model with a good fit is between the underfitted and overfitted model,
and ideally, it makes predictions with 0 errors, but in practice, it is difficult to
achieve it.
• As when we train our model for a time, the errors in the training data go down,
and the same happens with test data.
• But if we train the model for a long duration, then the performance of the model
may decrease due to the overfitting, as the model also learn the noise present
in the dataset.
• The errors in the test dataset start increasing, so the point, just before the
raising of errors, is the good point, and we can stop here for achieving a good
model.
There are two other methods by which we can get a good point for our model,
which are the resampling method to estimate model accuracy and validation
dataset.
o Reducible errors: These errors can be reduced to improve the model accuracy.
Such errors can further be classified into
regardless of which algorithm has been used. The cause of these errors is unknown
variables whose value can't be reduced.
What is Bias?
o Low Bias: A low bias model will make fewer assumptions about the form of the
target function.
o High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset. A high bias
model also cannot perform well on new data.
Some examples of machine learning algorithms with low bias are Decision
Trees, k- Nearest Neighbours and Support Vector Machines.
At the same time, an algorithm with high bias is Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
High bias mainly occurs due to a much simple model. Below are some ways to reduce
the high bias:
variance tells that how much a random variable is different from its
expected value. Ideally, a model should not vary too much from one training dataset to
another, which means the algorithm should be good in understanding the hidden
mapping between inputs and output variables.
A model that shows high variance learns a lot and perform well with the training
dataset, and does not generalize well with the unseen dataset. As a result, such a model
gives good results with the training dataset but shows high error rates on the test
dataset.
Since, with high variance, the model learns too much from the dataset, it leads to
overfitting of the model. A model with high variance has the below problems:
Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high variance.
Some examples of machine learning algorithms with low variance are, Linear
Regression, Logistic Regression, and Linear discriminant analysis.
At the same time, algorithms with high variance are decision tree, Support
Vector Machine, and K-nearest neighbours.
o High training error and the test error is almost similar to training error.
Bias-Variance Trade-Off
• While building the machine learning model, it is really important to take care
of bias and variance in order to avoid overfitting and underfitting in the
model.
• If the model is very simple with fewer parameters, it may have low variance
and high bias.
• Whereas, if the model has a large number of parameters, it will have high
variance and low bias.
• So, it is required to make a balance between bias and variance errors, and this
balance between the bias error and variance error is known as the Bias-
Variance trade-off.
For an accurate prediction of the model, algorithms need a low variance and low bias.
But this is not possible because bias and variance are related to each other:
o Ideally, we need a model that accurately captures the regularities in training data
and simultaneously generalizes well with the unseen dataset.
o Unfortunately, doing this is not possible simultaneously.
o Because a high variance algorithm may perform well with training data, but it
may lead to overfitting to noisy data.
Hence, the Bias-Variance trade-off is about finding the sweet spot to make
a balance between bias and variance errors.
LINEAR REGRESSION
Linear regression is a statistical method used to model the relationship between a dependent
variable (target) and one or more independent variables (predictors). The goal is to fit a linear
equation to the observed data. Here are the key points about linear regression:
Assumptions
1. Linearity: The relationship between the dependent and independent variables is linear.
2. Independence: Observations are independent of each other.
3. Homoscedasticity: The variance of errors is constant across all levels of the independent
variables.
4. Normality: The errors are normally distributed.
Example
Predicting house prices based on the size of the house. Here, the size of the house is the
independent variable, and the house price is the dependent variable.
Multiple linear regression models the relationship between two or more independent variables
and a dependent variable. The equation is:
Example
Predicting house prices based on multiple factors such as size, number of bedrooms, location, and
age of the house. Here, size, number of bedrooms, location, and age are the independent variables,
and the house price is the dependent variable.
Key Differences
• Number of Predictors: Simple linear regression uses one predictor, while multiple linear
regression uses two or more.
• Complexity: Multiple linear regression is more complex as it involves multiple predictors,
which can capture more nuanced relationships but also require careful consideration of
multicollinearity (when predictors are highly correlated).
Evaluation Metrics
Simple and multiple linear regression are foundational techniques in statistics and machine
learning, used to understand relationships between variables and make predictions based on data.
Polynomial Regression
Polynomial regression is an extension of linear regression that models the relationship between
the independent variable’x’and the dependent variable ‘y’ as an nth degree polynomial. It
allows for more complex, non-linear relationships.
Equation
Example
Predicting the progression of a disease based on time. A linear model might not capture the
progression accurately, but a polynomial model can account for the accelerating or decelerating
rate of progression.
Key Concepts
• The degree of the polynomial nnn determines the flexibility of the model.
• A higher degree polynomial can fit more complex data patterns but may also lead to
overfitting.
Basis Functions
Fit the Model: Use linear regression techniques to fit the model to the transformed features.
Model Evaluation: Assess the model's performance using metrics such as R2R^2R2, RMSE
(Root Mean Square Error), and cross-validation.
Advantages
Disadvantages
1. Overfitting: High-degree polynomials can overfit the training data, leading to poor
generalization.
2. Extrapolation: Predictions outside the range of the training data can be unreliable.
3. Computational Complexity: High-degree polynomials require more computational
resources and can suffer from numerical instability.
Applications
Polynomial regression is a powerful tool for capturing non-linear relationships in data, but it
must be used with caution to avoid overfitting and ensure meaningful predictions.
• Interpretation: An R2 value close to 1 indicates a good fit, meaning that the model
explains a large portion of the variance in the dependent variable. However, it doesn't
account for overfitting or the complexity of the model.
Adjusted R-squared
• Definition: Adjusted R-squared adjusts the R-squared value for the number of predictors
in the model, providing a more accurate measure of model performance.
• Formula:
•
• Where ‘n’ is the number of observations and ‘p’ is the number of predictors.
• Interpretation: Unlike R-squared, the adjusted R-squared can decrease if unnecessary
predictors are added to the model.
• Definition: MSE measures the average of the squares of the errors, i.e., the average
squared difference between the observed and predicted values.
• Formula
•
• Interpretation: Lower MSE values indicate a better fit. However, it is sensitive to
outliers since errors are squared.
• Definition: RMSE is the square root of the MSE, providing a measure of the average
magnitude of the errors in the same units as the dependent variable.
• Formula:
• Definition: MAE measures the average absolute difference between the observed and
predicted values.
• Formula
Interpretation: Lower MAE values indicate a better fit. MAE is less sensitive to outliers
compared to MSE.
PART -A
Example: Some scientist claims that ultraviolet (UV) light can damage
the eyes then it may also cause blindness.In this example, a scientist
just claims that UV rays are harmful to the eyes, but we assume they
may cause blindness. However, it may or may not be possible. Hence,
these types of assumptions are called a hypothesis.
15. Define hypothesis set. CO1, K1
Hypothesis space is defined as a set of all possible legal
hypotheses; hence it is also known as a hypothesis set.
Where, Y: Range ,m: Slope of the line which divided test data or
changes in y divided by change in x, x: domain, c: intercept
(constant)
17. Define inductive bias? CO1, K1
Every machine learning model requires some type of architecture
design and possibly some initial assumptions about the data we want
to
analyze. Generally, every building block and every belief that we
make about the data is a form of inductive bias.
18. What is k-NN algorithm? CO1, K1
The k-Nearest Neighbors(k-NN) algorithm assumes that entities
belonging to a particular category should appear near each other, and
those that are part of different groups should be distant. In other
words, we assume that similar data points are clustered near each
other away from the dissimilar ones.
19. Define bias and variance? CO1, K1
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
21. How to avoid underfitting in model? CO1, K2
A good learner will learn with high probability and close approximation to
the target concept,the selected hypothesis will have lower the
error(“approximately correct”)with the parameters ε and δ is called
probably approximately correct learning.
24. Hypothesis h generated the errors with respect to price and engine CO1, K3
power of 5 samples,given ε = 0.05 and δ = 0.20
s.no 1 2 3 4 5
Error(h) 0.001 0.07 0.045 0.065 0.036
Part-B
4. Explain in detail why to learn linear algebra before machine learning CO1,K3
8. Briefly explain the two main problems that degrade the CO1,K3
performance of machine learning models?
9. What is bias and variance? Explain its trade off? CO1,K3