Chapter 1
Chapter 1
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work
on our instructions. But can a machine also learn from experiences or past data like a human
does? So here comes the role of Machine Learning.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned with
the development of algorithms which allow a computer to learn from the data and past
experiences on their own. The term machine learning was first introduced by Arthur
Samuel in 1959. We can define it in a summarized way as:
Definition
Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.
With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions
without being explicitly programmed. Machine learning brings computer science and
statistics together for creating predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.
A machine has the ability to learn if it can improve its performance by gaining more
data.
A Machine Learning system learns from historical data, builds the prediction models,
and whenever it receives new data, predicts the output for it. The accuracy of predicted
output depends upon the amount of data, as the huge amount of data helps to build a better
model which predicts the output more accurately.
Suppose we have a complex problem, where we need to perform some predictions, so instead
of writing a code for it, we just need to feed the data to generic algorithms, and with the help
of these algorithms, machine builds the logic as per the data and predict the output. Machine
learning has changed our way of thinking about the problem. The below block diagram
explains the working of Machine Learning algorithm:
The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly. As a human, we have some limitations as we cannot access the huge
amount of data manually, so for this, we need some computer systems and here comes the
machine learning to make things easy for us.
The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as Netflix and
Amazon have build machine learning models that are using a vast amount of data to analyze
the user interest and recommend product accordingly.
1) Supervised Learning
The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teach
o Classification
o Regression
o
2) Unsupervised Learning
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any supervision.
The goal of unsupervised learning is to restructure the input data into new features or a group
of objects with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two categories
of algorithms:
o Clustering
o Association
3) Reinforcement Learning
Algorithms based on highly complex neural networks that mimic the way a human
brain works to detect patterns in large unstructured data sets.
1.Learning Associations
2.Classification
3.Pattern Recognition
5.Spam filering
6.Recommendation Systems
How it works
A model is trained by providing it with labeled examples
The model analyzes patterns and relationships between the input and
output variables
The model learns to make predictions based on the patterns it has
identified
The VC dimension of a hypothesis class is the maximum number of points that the hypothesis class
can shatter
What is Shattering?
A hypothesis class is said to “shatter” a set of data points if, no matter how you label those
points (e.g., assign them as positive or negative), the hypothesis class has a function that can
correctly classify them.
Example of Shattering:
Imagine you have two points on a 2D plane.
A straight line (linear hypothesis) can divide these two points in all possible ways based on
their labels (e.g., positive-negative or negative-positive). Hence, the hypothesis class of
straight lines shatters these two points.
However, for three points that form a triangle, a straight line cannot shatter them if their
labels are mixed in a specific way.
If a model can shatter three points but not four, its VC dimension is 3.
If can shatter points, but not points, the VC dimension of $H$ is $n$.
Importance of VC Dimension
In machine learning, understanding the capacity and performance of a model is critical. One
important concept that helps in this understanding is the Vapnik-Chervonenkis (VC)
dimension. The VC dimension measures the ability of a hypothesis space (the set of all
possible models) to fit different patterns in a dataset.
Introduced by Vladimir Vapnik and Alexey Chervonenkis, this concept plays a vital role in
assessing the trade-off between model complexity and generalization. It helps us understand
how well a model can balance learning from the training data and performing well on unseen
data.
Overfitting: A model with a very high VC dimension may overfit, memorizing the training
data instead of generalizing.
Underfitting: A model with a very low VC dimension may underfit, failing to capture the
patterns in data.
.
The hypotheses that fit the data with zero error are called consistent
with the training data. We call the set of hypotheses that are consistent
with training data the version space.
The task is to build a supervised learning model that correctly predicts if a pair
[Undergraduate GPA, GRE score] will be accepted to the university or not.
Approximately correct
A hypothesis is said to be approximately correct, if the
probability of error is less than or equal to ϵ,
where 0≤ϵ≤1/2
That is, P(T⊕H)≤ϵ
P(P(T⊕H)≤ϵ)>1−δ
Now, suppose we have been given the below errors of a
hypothesis: H1 on 5 different sample sets of the university
acceptance training set. ϵ =0.05, δ=0.2.
1 0.069
2 0.03
3 0.021
4 0.05
5 0.013
Noise
Noise in machine learning is irrelevant or unpredictable data that can make
machine learning models less accurate. It can be caused by human error,
unreliable data collection tools, or random variations.
What causes noise?
Human error: Mistakes made by people who collect or process data
Unreliable data collection tools: Tools that don't collect data accurately
Random variations: Natural variability in complex systems
Attacks: Malicious attempts to introduce noise into data
How does noise affect machine learning?
Noise can make it difficult to identify patterns in data
Noise can cause algorithms to misinterpret data
Noise can lead to performance issues like overfitting
Noise can make machine learning models less robust and reliable
Learning Multiple Classes
Multiclass Classification
Multiclass classification is a machine learning classification task that consists of more than
two classes, or outputs
Examples
K-nearest neighbor
Decision Trees
Regression
Regression is a key element of predictive modelling, so can be found within many different
applications of machine learning. Whether powering financial forecasting or predicting
healthcare trends, regression analysis can bring organisations key insight for decision-
making. It’s already used in different sectors to forecast house prices, stock or share prices, or
map salary changes
Example: Suppose there is a marketing company A, who does various advertisement every
year and get sales on that. The below list shows the advertisement made by the company in
the last 5 years and the corresponding sales:
Now, the company wants to do the advertisement of $200 in the year 2019 and wants to
know the prediction about the sales for this year. So to solve such type of prediction
problems in machine learning, we need regression analysis
In Regression, we plot a graph between the variables which best fits the given datapoints,
using this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.
Linear Regression:
o Linear regression is a statistical regression method which is used for predictive
analysis.
o It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-
axis) and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple
linear regression. And if there is more than one input variable, then such linear
regression is called multiple linear regression.
o The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee on the basis
of the year of experience.
1. Y= aX+b
The Mean Squared Error measures how close a regression line is to a set of data points. It is a
risk function corresponding to the expected value of the squared error loss.
Mean square error is calculated by taking the average, specifically the mean, of errors
squared from data as it relates to a function.
Fig: Regression Line
A larger MSE indicates that the data points are dispersed widely around its central moment
(mean), whereas a smaller MSE suggests the opposite. A smaller MSE is preferred because it
indicates that your data points are dispersed closely around its central moment (mean). It
reflects the centralized distribution of your data values, the fact that it is not skewed, and,
most importantly, it has fewer errors (errors measured by the dispersion of the data points
from its mean).
Lesser the MSE => Smaller is the error => Better the estimator.
where:
n – sample size
This algorithm explains the linear relationship between the dependent(output) variable
y and the independent(predictor) variable X using a straight line Y= B0 + B1 X.
The goal of the linear regression algorithm is to get the best values for B0 and B1 to
find the best fit line. The best fit line is a line that has the least error which means the
error between predicted values and actual values should be minimum.
Ill-Posed Problem
1. a solution exists,
Model Selection
Model Generalization
Diettrich 2003 (“Machine Learning”) describes the triple tradeoff for empirical
(supervised) learning. That is, there exists a tradeoff between:
1) The size or complexity of the learned classifier
2) The amount of training data
3) The generalization accuracy on new examples
Validation Set
The validation set is used to fine-tune the hyperparameters of the model
and is considered a part of the training of the model. The model only sees
this data for evaluation but does not learn from this data, providing an
objective unbiased evaluation of the model.
Test Set
The final step is to use a test set to verify the model's functionality. Some publications refer to the
validation dataset as a test set, especially if there are only two subsets instead of three. Similarly, if
records in this final test set have not formed part of a previous evaluation or cross-validation, they
might also constitute a holdout set.
Cross Validation