0% found this document useful (0 votes)
12 views30 pages

UNIT-1 Material

The document provides an overview of well-posed learning problems, defining their key components: task, performance measure, and experience, along with examples such as checkers playing and handwriting recognition. It also discusses various applications of machine learning, including image recognition, speech recognition, and self-driving cars, as well as different types of machine learning systems like supervised, unsupervised, semi-supervised, and reinforcement learning. Each type is explained with its advantages, disadvantages, and real-world applications.

Uploaded by

Prashant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views30 pages

UNIT-1 Material

The document provides an overview of well-posed learning problems, defining their key components: task, performance measure, and experience, along with examples such as checkers playing and handwriting recognition. It also discusses various applications of machine learning, including image recognition, speech recognition, and self-driving cars, as well as different types of machine learning systems like supervised, unsupervised, semi-supervised, and reinforcement learning. Each type is explained with its advantages, disadvantages, and real-world applications.

Uploaded by

Prashant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT-1 Introduction

1) Define Well-posed problems? Illustrate any four examples for Well-posed problems?

The formal definition of Well posed learning problem is, “A computer program is said
to learn from Experience E when given a task T, and some performance measure P. If
it performs on T with a performance measure P, then it upgrades with experience E.

To break it down, the three important components of a well-posed learning problem


are,
 Task
 Performance Measure
 Experience

To understand the topic better let’s have a look at a few classical examples,

 Learning to play Checkers:


A computer might improve its performance as an ability to win at the class of tasks
that are about playing checkers. The performance keeps improving through
experience by playing against itself.

To simplify,
T -> Play the checkers game.
P -> Percentage of games won against the opponent.
E -> playing practice games against itself.

 Handwriting Recognition:
Handwriting recognition (HWR) is a technology that converts a user’s handwritten
letters or words into a computer-readable format (e.g., Unicode text).

Its applications are numerous, it is used in reading postal addresses, bank forms, etc.
T -> recognizing and classifying handwritten words from images.
P -> Percentage of correctly identified words.
E -> set of handwritten words with their classifications in a database.

 A Robot Driving Learning Problem:


For a robot to drive on a four-lane highway it needs a human-like understanding of
all the possibilities it might encounter.

With the use of sight scanners and advanced machine learning algorithms, it can be
made possible.
T –> To drive on public four-lane highways using sight scanners.
P -> the average distance progressed before an error.
E -> the order of images and steering instructions noted down while observing a
human driver.
 A spam filtering for emails learning problem:
A spam filter is software that detects unsolicited and undesired email and prevents it
from reaching the inbox of a user.

T -> Identifying whether or not an email is spam.


P -> The percentage of emails correctly categorized as spam or nonspam.
E -> Observing how you categorize emails as spam or nonspam.

 Face Recognition Problem:


A facial recognition system device is capable of matching a human face from a digital
image or a video frame against a database of faces.

It works by locating and measuring facial characteristics from a given image and is
often used to verify users through ID verification services.

T -> Predicting distinct sorts of faces.


P -> Ability to anticipate the largest number of different sorts of faces.
E -> train the system with as many datasets of varied facial photos as possible.

2) List some applications of Machine Learning?


1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a
photo with our Facebook friends, then we automatically get a tagging suggestion with
name, and the technology behind this is machine learning's face detection and recognition
algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
32. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present, machine
learning algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the
voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for
some product on Amazon, then we started getting an advertisement for the same product
while internet surfing on the same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series,
movies, etc., and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.
6. Email Spam and Malware Filtering:
Whenever we receive a new email, it is filtered automatically as important, normal, and
spam. We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning. Below are
some spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,
and Naïve Bayes classifier are used for email spam filtering and malware detection.
7. Virtual Personal Assistant:
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As
the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.
These virtual assistants use machine learning algorithms as an important part.
These assistant record our voice instructions, send it over the server on a cloud, and decode
it using ML algorithms and act accordingly.
8. Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways that
a fraudulent transaction can take place such as fake accounts, fake ids, and steal money in
the middle of a transaction. So to detect this, Feed Forward Neural network helps us by
checking whether it is a genuine transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these
values become the input for the next round. For each genuine transaction, there is a specific
pattern which gets change for the fraud transaction hence, it detects it and makes our
online transactions more secure.
9. Stock Market trading:
Machine learning is widely used in stock market trading. In the stock market, there is always
a risk of up and downs in shares, so for this machine learning's long short term memory
neural network is used for the prediction of stock market trends.
10. Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.
11. Automatic Language Translation:
Nowadays, if we visit a new place and we are not aware of the language then it is not a
problem at all, as for this also machine learning helps us by converting the text into our
known languages. Google's GNMT (Google Neural Machine Translation) provide this feature,
which is a Neural Machine Learning that translates the text into our familiar language, and it
called as automatic translation.
The technology behind the automatic translation is a sequence to sequence learning
algorithm, which is used with image recognition and translates the text from one language
to another language.

3) Explain different types of Machine Learning systems?

Supervised Learning :
As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and
based on the training, the machine predicts the output. Here, the labelled data specifies
that some of the inputs are already mapped to the output. More preciously, we can say;
first, we train the machine with the input and corresponding output, and then we ask the
machine to predict the output using the test dataset.
Let's understand supervised learning with an example. Suppose we have an input dataset of
cats and dog images. So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height
(dogs are taller, cats are smaller), etc. After completion of training, we input the picture of
a cat and ask the machine to identify the object and predict the output. Now, the machine is
well trained, so it will check all the features of the object, such as height, shape, colour,
eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the
process of how the machine identifies the objects in Supervised Learning.
The main goal of the supervised learning technique is to map the input variable(x) with
the output variable(y). Some real-world applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc.
Categories of Supervised Machine Learning
Supervised machine learning can be classified into two types of problems, which are given
below:
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real-world
examples of classification algorithms are Spam Detection, Email filtering, etc.
Some popular classification algorithms are given below:
o Random Forest Algorithm
o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm
b) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.
Some popular Regression algorithms are given below:
o Simple Linear Regression Algorithm
o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression
Advantages and Disadvantages of Supervised Learning
Advantages:
o Since supervised learning work with the labelled dataset so we can have an exact
idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior
experience.
Disadvantages:
o These algorithms are not able to solve complex tasks.
o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.
Unsupervised Learning :
Un supervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabeled dataset, and the machine predicts the output
without any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines are
instructed to find the hidden patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown to
the model, and the task of the machine is to find the patterns and categories of the objects.

So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.

Categories of Unsupervised Machine Learning


Unsupervised Learning can be further classified into two types, which are given below:
o Clustering
o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the data. It
is a way to group the objects into a cluster such that the objects with the most similarities
remain in one group and have fewer or no similarities with the objects of other groups. An
example of the clustering algorithm is grouping the customers by their purchasing
behaviour.
Some of the popular clustering algorithms are given below:
o K-Means Clustering algorithm
o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis
2) Association
Association rule learning is an unsupervised learning technique, which finds interesting
relations among variables within a large dataset. The main aim of this learning algorithm is
to find the dependency of one data item on another data item and map those variables
accordingly so that it can generate maximum profit. This algorithm is mainly applied
in Market Basket analysis, Web usage mining, continuous production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.
Advantages and Disadvantages of Unsupervised Learning Algorithm
Advantages:
o These algorithms can be used for complicated tasks compared to the supervised
ones because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.
Disadvantages:
o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.
Semi supervised Learning:
To overcome the drawbacks of supervised learning and unsupervised learning algorithms,
the concept of Semi-supervised learning is introduced. The main aim of semi-supervised
learning is to effectively use all the available data, rather than only labelled data like in
supervised learning. Initially, similar data is clustered along with an unsupervised learning
algorithm, and further, it helps to label the unlabeled data into labelled data. It is because
labelled data is a comparatively more expensive acquisition than unlabeled data.
We can imagine these algorithms with an example. Supervised learning is where a student is
under the supervision of an instructor at home and college. Further, if that student is self-
analysing the same concept without any help from the instructor, it comes under
unsupervised learning. Under semi-supervised learning, the student has to revise himself
after analyzing the same concept under the guidance of an instructor at college.
Advantages and disadvantages of Semi-supervised Learning
Advantages:
o It is simple and easy to understand the algorithm.
o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.
Disadvantages:
o Iterations results may not be stable.
o We cannot apply these algorithms to network-level data.
o Accuracy is low.
4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking action,
learning from experiences, and improving its performance. Agent gets rewarded for each
good action and get punished for each bad action; hence the goal of reinforcement learning
agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.
The reinforcement learning process is similar to a human being; for example, a child learns
various things by experiences in his day-to-day life. An example of reinforcement learning is
to play a game, where the Game is the environment, moves of an agent at each step define
states, and the goal of the agent is to get a high score. Agent receives feedback in terms of
punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
A reinforcement learning problem can be formalized using Markov Decision
Process(MDP). In MDP, the agent constantly interacts with the environment and performs
actions; at each action, the environment responds and generates a new state.
Categories of Reinforcement Learning
Reinforcement learning is categorized mainly into two types of methods/algorithms:
o Positive Reinforcement Learning: Positive reinforcement learning specifies
increasing the tendency that the required behaviour would occur again by adding
something. It enhances the strength of the behaviour of the agent and positively
impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works exactly
opposite to the positive RL. It increases the tendency that the specific behaviour
would occur again by avoiding the negative condition.

Advantages and Disadvantages of Reinforcement Learning


Advantages
o It helps in solving complex real-world problems which are difficult to be solved by
general techniques.
o The learning model of RL is similar to the learning of human beings; hence most
accurate results can be found.
o Helps in achieving long term results.
Disadvantage
o RL algorithms are not preferred for simple problems.
o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can
weaken the results.
4) Explain Artificial Intelligence , Deep Learning , Machine Learning with examples?

Artificial intelligence is a wide-ranging branch of computer science


concerned with building smart machines capable of performing tasks that
typically require human intelligence.

How does AI work?


As the hype around AI has accelerated, vendors have been scrambling to promote how their
products and services use AI. Often what they refer to as AI is simply one component of
AI, such as machine learning. AI requires a foundation of specialized hardware and software
for writing and training machine learning algorithms. No one programming language is
synonymous with AI, but a few, including Python, R and Java, are popular.

In general, AI systems work by ingesting large amounts of labeled training data, analyzing
the data for correlations and patterns, and using these patterns to make predictions about
future states. In this way, a chatbot that is fed examples of text chats can learn to produce
lifelike exchanges with people, or an image recognition tool can learn to identify and
describe objects in images by reviewing millions of examples.

AI programming focuses on three cognitive skills: learning, reasoning and self-correction.

Learning processes. This aspect of AI programming focuses on acquiring data and creating
rules for how to turn the data into actionable information. The rules, which are
called algorithms, provide computing devices with step-by-step instructions for how to
complete a specific task.
Advantages
 Good at detail-oriented jobs;
 Reduced time for data-heavy tasks;
 Delivers consistent results; and
 AI-powered virtual agents are always available.
Disadvantages
 Expensive;
 Requires deep technical expertise;
 Limited supply of qualified workers to build AI tools;
 Only knows what it's been shown; and
 Lack of ability to generalize from one task to another.
Strong AI vs. weak AI
AI can be categorized as either weak or strong.
 Weak AI, also known as narrow AI, is an AI system that is designed and trained to
complete a specific task. Industrial robots and virtual personal assistants, such as Apple's
Siri, use weak AI.
 Strong AI, also known as artificial general intelligence (AGI), describes programming that
can replicate the cognitive abilities of the human brain. When presented with an
unfamiliar task, a strong AI system can use fuzzy logic to apply knowledge from one
domain to another and find a solution autonomously. In theory, a strong AI program
should be able to pass both a Turing Test and the Chinese room test.
Deep Learning :
Deep learning is a subset of machine learning, which is essentially a neural network with
three or more layers. These neural networks attempt to simulate the behavior of the human
brain—albeit far from matching its ability—allowing it to “learn” from large amounts of
data. While a neural network with a single layer can still make approximate predictions,
additional hidden layers can help to optimize and refine for accuracy.
Deep learning drives many artificial intelligence (AI) applications and services that improve
automation, performing analytical and physical tasks without human intervention. Deep
learning technology lies behind everyday products and services (such as digital assistants,
voice-enabled TV remotes, and credit card fraud detection) as well as emerging technologies
(such as self-driving cars).
How deep learning works
Deep learning neural networks, or artificial neural networks, attempts to mimic the human
brain through a combination of data inputs, weights, and bias. These elements work
together to accurately recognize, classify, and describe objects within the data.
Deep neural networks consist of multiple layers of interconnected nodes, each building
upon the previous layer to refine and optimize the prediction or categorization. This
progression of computations through the network is called forward propagation. The input
and output layers of a deep neural network are called visible layers. The input layer is where
the deep learning model ingests the data for processing, and the output layer is where the
final prediction or classification is made.
Another process called backpropagation uses algorithms, like gradient descent, to calculate
errors in predictions and then adjusts the weights and biases of the function by moving
backwards through the layers in an effort to train the model. Together, forward propagation
and backpropagation allow a neural network to make predictions and correct for any errors
accordingly. Over time, the algorithm becomes gradually more accurate.
The above describes the simplest type of deep neural network in the simplest terms.
However, deep learning algorithms are incredibly complex, and there are different types of
neural networks to address specific problems or datasets. For example,
 Convolutional neural networks (CNNs), used primarily in computer vision and image
classification applications, can detect features and patterns within an image,
enabling tasks, like object detection or recognition. In 2015, a CNN bested a human
in an object recognition challenge for the first time.
 Recurrent neural network (RNNs) are typically used in natural language and speech
recognition applications as it leverages sequential or times series data.
Deep learning applications
Real-world deep learning applications are a part of our daily lives, but in most cases, they
are so well-integrated into products and services that users are unaware of the complex
data processing that is taking place in the background. Some of these examples include the
following:
Law enforcement
Deep learning algorithms can analyze and learn from transactional data to identify
dangerous patterns that indicate possible fraudulent or criminal activity. Speech
recognition, computer vision, and other deep learning applications can improve the
efficiency and effectiveness of investigative analysis by extracting patterns and evidence
from sound and video recordings, images, and documents, which helps law enforcement
analyze large amounts of data more quickly and accurately.
Financial services
Financial institutions regularly use predictive analytics to drive algorithmic trading of stocks,
assess business risks for loan approvals, detect fraud, and help manage credit and
investment portfolios for clients.
Customer service
Many organizations incorporate deep learning technology into their customer service
processes. Chatbots—used in a variety of applications, services, and customer service
portals—are a straightforward form of AI. Traditional chatbots use natural language and
even visual recognition, commonly found in call center-like menus. However,
more sophisticated chatbot solutions attempt to determine, through learning, if there are
multiple responses to ambiguous questions. Based on the responses it receives, the chatbot
then tries to answer these questions directly or route the conversation to a human user.
Virtual assistants like Apple's Siri, Amazon Alexa, or Google Assistant extends the idea of a
chatbot by enabling speech recognition functionality. This creates a new method to engage
users in a personalized way.
Healthcare
The healthcare industry has benefited greatly from deep learning capabilities ever since the
digitization of hospital records and images. Image recognition applications can support
medical imaging specialists and radiologists, helping them analyze and assess more images
in less time.
5) Differentiate between Supervised and Un-Supervised Learning Techniques?
6) Explain Testing and Training Loss with examples?

Training a model simply means learning (determining) good values for all the weights and
the bias from labeled examples. In supervised learning, a machine learning algorithm builds
a model by examining many examples and attempting to find a model that minimizes loss;
this process is called empirical risk minimization.

Loss is the penalty for a bad prediction. That is, loss is a number indicating how bad the
model's prediction was on a single example. If the model's prediction is perfect, the loss is
zero; otherwise, the loss is greater. The goal of training a model is to find a set of weights
and biases that have low loss, on average, across all examples. For example, Figure 3 shows
a high loss model on the left and a low loss model on the right. Note the following about the
figure:

 The arrows represent loss.


 The blue lines represent predictions.

Figure 3. High loss in the left model; low loss in the right model.
Notice that the arrows in the left plot are much longer than their counterparts in the right
plot. Clearly, the line in the right plot is a much better predictive model than the line in the
left plot.

You might be wondering whether you could create a mathematical function—a loss function
—that would aggregate the individual losses in a meaningful fashion.

Squared loss: a popular loss function

The linear regression models we'll examine here use a loss function called squared loss (also
known as L2 loss). The squared loss for a single example is as follows:

= the square of the difference between the label and the prediction
= (observation - prediction(x))2
= (y - y')2

Mean square error (MSE) is the average squared loss per example over the whole dataset.
To calculate MSE, sum up all the squared losses for individual examples and then divide by
the number of examples:

MSE=1N∑(x,y)∈D(y−prediction(x))2

where:

 (x,y) is an example in which


 x is the set of features (for example, chirps/minute, age, gender) that the model uses to
make predictions.
 y is the example's label (for example, temperature).
 prediction(x) is a function of the weights and bias in combination with the set of features x.
 D is a data set containing many labeled examples, which are (x,y) pairs.
 N is the number of examples in D.

Although MSE is commonly-used in machine learning, it is neither the only practical loss
function nor the best loss function for all circumstances.
7) Explain the Trade-Off in Machine Learning with examples?

In Machine Learning, the performance and complexity of the model not only depends on
certain parameters, assumptions and conditions. but also on the quality of data that is used
to train the model and that’s one of the steps that everyone goes through i.e. cleaning and
standardizing the data.
If the data is not cleaned and standardized then no matter how fine tune the model
parameters and hyper-parameters are, the model will not be able to provide the best
solution.
Let’s understand how data plays an important role in model’s performance and complexity, if
not given proper massaging to data is given and how it introduces bias and
variance(uncertainty) in the model.

Skewness in Data
In simple words, skewness is the measure of how much the probability distribution of a
random variable deviates from the normal distribution (probability distribution without any
skewness).
If our data is positively skewed, it means that it has a higher number of data points having
low values. So, when we train our model on this data, it will perform better at predicting data
points with lower values as compared to those with higher values. Thus, introducing bias in
our machine learning model. Also, skewness tells us about the direction of outliers. For
positively skewed data, most of the outliers are present on the right side of the distribution.
Variance — is a measure of how far the observed data points differ from the average value
i.e., their difference from the mean value. When we want to know how spread out the data
in our sample set is, we will calculate the variance. Variance introduces uncertainty in the
data i.e. greater the variance greater the uncertainty in the data.
Now, that we have understood how data plays an important role in model performance and
complexity, let’s now understand the bias-versus-variance trade off.
Bias-vs-Variance Trade-Off
It is one of the important concepts to understand for supervised machine learning and
predictive modeling use cases and the main goal is to choose a model to train that offers
lowest bias versus variance tradeoff for that dataset or business use case.
Bias indicates inaccuracy of the model prediction in comparison with the true value. It is due
to the erroneous/inaccurate assumption made in training process to simplify the model and
make the target function easier to learn.
Variance indicates the change in target function if different training data is used. It is caused
by modeling the noise present in the training data, which implies that the model is too
sensitive to the training data (thus giving different estimates when given new training data).

The table above gives a high-level overview of the difference between bias and variance, now
let’s discuss in depth and see how we can find a sweet spot.
Having fewer assumptions about the model training process can help generalize relevant
relations between features and target outputs. Low bias means fewer assumptions are made
about the target function, while high bias means more assumptions are made about the
target function. Having more assumptions can potentially miss important relations between
features and outputs and cause underfitting.
Underfitting refers to the situation in which models neither fit the training data nor
generalize to new
Low variance indicates changes in training data would result in similar target functions,
whereas, high variance indicates changes in training data would result in very different target
functions. High variance suggests that the algorithm learns the random noise instead of the
output and causes overfitting.
Overfitting refers to the situation in which models fit the training data very well but fails to
generalize to new data.

Generally, increasing model complexity would decrease bias error since the model has more
capacity to learn from the training data, but the variance error would increase, and the
model may begin to learn from noise in the training data.
The goal of training machine learning models is to achieve low bias and low variance.
The optimal model complexity is where bias error crosses with variance error.
The prediction error can be viewed as the sum of model error (error coming from the model)
and the irreducible error (coming from data collection).
prediction error = model error (Bias error² + variance error) + irreducible error
An optimal balance of bias and variance leads to a model that is neither overfit nor underfit.
This is the goal of predictive model — to isolate the signal from the dataset while ignoring
the noise.

Now, that we have understood the bias and variance error and how they cause underfitting
and overfitting let’s go through some of the steps that can be used to overcome these
problems.
· K-fold cross-validation: it split the initial training data into k subsets and train the model k
times. In each training, it uses one subset as the testing data and the rest as training data.
· Validation dataset: sample out a dataset from the initial training data to estimate how well
the model generalizes on new data.
· Simplify the model: for example, using fewer layers or less neurons to make the neural
network smaller.
· Use more data.
· Reduce Dimensionality in training data: its projects training data into a smaller dimension
to decrease the model complexity.
· Regularization: it tunes or selects preferred level of model complexity, so our model
becomes better at predicting (generalizing)
· Stop the training early when the performance on the testing dataset has not improved
after a number of training iterations.
In this article we learnt about bias and variance and how it underfits and overfits the model
respectively and see how data plays an important role in it. We understood how important
this tradeoff is, to achieve optimal model complexity and performance.
8) What is Under fitting and Over fitting of data explain?
In the real world, the dataset present will never be clean and perfect. It means each dataset
contains impurities, noisy data, outliers, missing data, or imbalanced data. Due to these
impurities, different problems occur that affect the accuracy and the performance of the
model. One of such problems is Overfitting in Machine Learning. Overfitting is a problem
that a model can exhibit.
A statistical model is said to be overfitted if it can’t generalize well with unseen data.
Before understanding overfitting, we need to know some basic terms, which are:
Noise: Noise is meaningless or irrelevant data present in the dataset. It affects the
performance of the model if it is not removed.
Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the
machine learning algorithms. Or it is the difference between the predicted values and the
actual values.
Variance: If the machine learning model performs well with the training dataset, but does
not perform well with the test dataset, then variance occurs.
Generalization: It shows how well a model is trained to predict unseen data.
What is Overfitting?

o Overfitting & underfitting are the two main errors/problems in the machine learning
model, which cause poor performance in Machine Learning.
o Overfitting occurs when the model fits more data than required, and it tries to
capture each and every datapoint fed to it. Hence it starts capturing noise and
inaccurate data from the dataset, which degrades the performance of the model.
o An overfitted model doesn't perform accurately with the test/unseen dataset and
can’t generalize well.
o An overfitted model is said to have low bias and high variance.
Example to Understand Overfitting
We can understand overfitting with a general example. Suppose there are three students, X,
Y, and Z, and all three are preparing for an exam. X has studied only three sections of the
book and left all other sections. Y has a good memory, hence memorized the whole book.
And the third student, Z, has studied and practiced all the questions. So, in the exam, X will
only be able to solve the questions if the exam has questions related to section 3. Student Y
will only be able to solve questions if they appear exactly the same as given in the book.
Student Z will be able to solve all the exam questions in a proper way.
The same happens with machine learning; if the algorithm learns from a small part of the
data, it is unable to capture the required data points and hence under fitted.
Suppose the model learns the training dataset, like the Y student. They perform very well on
the seen dataset but perform badly on unseen data or unknown instances. In such cases,
the model is said to be Overfitting.
And if the model performs well with the training dataset and also with the test/unseen
dataset, similar to student Z, it is said to be a good fit.
How to detect Overfitting?
Overfitting in the model can only be detected once you test the data. To detect the issue,
we can perform Train/test split.
In the train-test split of the dataset, we can divide our dataset into random test and training
datasets. We train the model with a training dataset which is about 80% of the total dataset.
After training the model, we test it with the test dataset, which is 20 % of the total dataset.

Now, if the model performs well with the training dataset but not with the test dataset,
then it is likely to have an overfitting issue.
For example, if the model shows 85% accuracy with training data and 50% accuracy with the
test dataset, it means the model is not performing well.
Ways to prevent the Overfitting
Although overfitting is an error in Machine learning which reduces the performance of the
model, however, we can prevent it in several ways. With the use of the linear model, we can
avoid overfitting; however, many real-world problems are non-linear ones. It is important to
prevent overfitting from the models. Below are several ways that can be used to prevent
overfitting:
1. Early Stopping
2. Train with more data
3. Feature Selection
4. Cross-Validation
5. Data Augmentation
6. Regularization
Early Stopping
In this technique, the training is paused before the model starts learning the noise within
the model. In this process, while training the model iteratively, measure the performance of
the model after each iteration. Continue up to a certain number of iterations until a new
iteration improves the performance of the model.
After that point, the model begins to over fit the training data; hence we need to stop the
process before the learner passes that point.
Stopping the training process before the model starts capturing noise from the data is
known as early stopping.

However, this technique may lead to the underfitting problem if training is paused too early.
So, it is very important to find that "sweet spot" between underfitting and overfitting.
Train with More data
Increasing the training set by including more data can enhance the accuracy of the model,
as it provides more chances to discover the relationship between input and output
variables.
It may not always work to prevent overfitting, but this way helps the algorithm to detect the
signal better to minimize the errors.
When a model is fed with more training data, it will be unable to overfit all the samples of
data and forced to generalize well.
But in some cases, the additional data may add more noise to the model; hence we need to
be sure that data is clean and free from in-consistencies before feeding it to the model.
Feature Selection
While building the ML model, we have a number of parameters or features that are used to
predict the outcome. However, sometimes some of these features are redundant or less
important for the prediction, and for this feature selection process is applied. In the feature
selection process, we identify the most important features within training data, and other
features are removed. Further, this process helps to simplify the model and reduces noise
from the data. Some algorithms have the auto-feature selection, and if not, then we can
manually perform this process.
Cross-Validation
Cross-validation is one of the powerful techniques to prevent overfitting.
In the general k-fold cross-validation technique, we divided the dataset into k-equal-sized
subsets of data; these subsets are known as folds.
Data Augmentation
Data Augmentation is a data analysis technique, which is an alternative to adding more data
to prevent overfitting. In this technique, instead of adding more training data, slightly
modified copies of already existing data are added to the dataset.
The data augmentation technique makes it possible to appear data sample slightly different
every time it is processed by the model. Hence each data set appears unique to the model
and prevents overfitting.
Regularization
If overfitting occurs when a model is complex, we can reduce the number of features.
However, overfitting may also occur with a simpler model, more specifically the Linear
model, and for such cases, regularization techniques are much helpful.
Regularization is the most popular technique to prevent overfitting. It is a group of methods
that forces the learning algorithms to make a model simpler. Applying the regularization
technique may slightly increase the bias but slightly reduces the variance. In this technique,
we modify the objective function by adding the penalizing term, which has a higher value
with a more complex model.
The two commonly used regularization techniques are L1 Regularization and L2
Regularization.
Ensemble Methods
In ensemble methods, prediction from different machine learning models is combined to
identify the most popular result.
The most commonly used ensemble methods are Bagging and Boosting.
In bagging, individual data points can be selected more than once. After the collection of
several sample datasets, these models are trained independently, and depending on the
type of task-i.e., regression or classification-the average of those predictions is used to
predict a more accurate result. Moreover, bagging reduces the chances of overfitting in
complex models.
In boosting, a large number of weak learners arranged in a sequence are trained in such a
way that each learner in the sequence learns from the mistakes of the learner before it. It
combines all the weak learners to come out with one strong learner. In addition, it improves
the predictive flexibility of simple models.
9) Explain in detail about Risk Statistics in statistics?
What are the risks of machine learning data?
Risks of Machine Learning
Nowadays, Machine Learning is playing a big role in helping organizations in different
aspects such as analysing structured and unstructured data, detecting risks, automating
manuals tasks, making data-driven decisions for business growth, etc. It is capable of
replacing the huge amount of human labour by applying automation and providing insights
to make better decisions for assessing, monitoring, and reducing the risks for an
organization.
Although machine learning can be used as a risk management tool, it also contains many
risks itself. While 49% of companies are exploring or planning to use machine learning, only
a small minority recognize the risks it poses. In which, only 41% of organizations in a global
McKinsey survey say they can comprehensively identify and prioritize machine learning
risks. Hence, it is necessary to be aware of some of the risks of machine learning-and how
they can be adequately evaluated and managed.
Below are a few risks associated with Machine Learning:
1. Poor Data
As we know, a machine learning model only works on the data that we provide to it, or we
can say it completely depends on human-given training data to work. What we will be input
that we will get as an output, so if we will enter the poor data, the ML model will generate
abrupt output. Poor data or dirty data includes errors in training data, outliers, and
unstructured data, which cannot be adequately interpreted by the model.
2. Overfitting
Overfitting is commonly found in non-parametric and non-linear models that are more
flexible to learn target function.
An overfitted model fits the training data so perfectly that it becomes unable to learn the
variability for the algorithm. It means it won't be able to generalize well when it comes to
testing real data.
3. Biased data
Biased data means that human biases can creep into your datasets and spoil outcomes. For
instance, the popular selfie editor FaceApp was initially inadvertently trained to make faces
"hotter" by lightening the skin tone-a result of having been fed a much larger quantity of
photos of people with lighter skin tones.
4. Lack of strategy and experience:
Machine learning is a very new technology in the IT sector; hence, less availability of trained
and skilled resources is a very big issue for the industries. Further, lack of strategy and
experience due to fewer resources leads to wastage of time and money as well as negatively
affect the organization's production and revenue. According to a survey of over 2000
people, 860 reported to lack of clear strategy and 840 were reported to lack of talent with
appropriate skill sets. This survey shows how lack of strategy and relevant experience
creates a barrier in the development of machine learning for organizations.
5. Security Risks
Security of data is one of the major issues for the IT world. Security also affects the
production and revenue of organizations. When it comes to machine learning, there are
various types of security risks exist that can compromise machine learning algorithms and
systems. Data scientists and machine learning experts have reported 3 types of attacks,
primarily for machine learning models. These are as follows:
o Evasion attacks:These attacks are commonly arisen due to adversarial input
introduced in the models; hence they are also known as adversarial attacks.
An evasion attack happens when the network uses adversarial examples as input
which can influence the classifiers, i.e., disrupting ML models. When a security
violation involves supplying malicious data that gets classified as genuine. A targeted
attack attempts to allow a specific intrusion or disruption, or alternatively to create
general mayhem.
Evasion attacks are the most dominant type of attack, where data is modified in a
way that it seems as genuine data. Evasion doesn't involve influence over the data
used to train a model, but it is comparable to the way spammers and hackers
obfuscate the content of spam emails and malware.
o Data Poisoning attacks:
In data poisoning attacks, the source of raw data is known, which is used to train the
ML models. Further, it strives to bias or "poison" the data to compromise the
resulting machine learning model's accuracy. The effects of these attacks can be
overcome by prevention and detection. Through proper monitoring, we can prevent
ML models from data poisoning.
Model skewing is one the most common type of data poisoning attacks in which
spammers categorise the classifiers with bad input as good.
o Model Stealing:
Model stealing is one of the most important security risks in machine learning.
Model stealing techniques are used to create a clone model based on information or
data used in the training of a base model. Why we are saying model stealing is a
major concern for ML experts because ML models are the valuable intellectual
property of organizations that consist of sensitive data of users such as account
details, transactions, financial information, etc. The attackers use public API and
sample data of the original model and reconstruct another model having a similar
look and feel.
6. Data privacy and confidentiality
Data is one of the main key players in developing Machine learning models. We know
machine learning requires a huge amount of structured and unstructured data for training
models so they can predict accurately in future. Hence, to achieve good results, we need to
secure data by defining some privacy terms and conditions as well as making it confidential.
Hackers can launch data extraction attacks that can fly under the radar, which can put your
entire machine learning system at risk.
7. Third-party risks
These types of security risks are not so famous in industries as there are very minimal
chances of these risks in industries. Third-party risks generally exist when someone
outsources their business to third-party service providers who may fail to properly govern a
machine learning solution. This leads to various types of data breaches in the ML industry.
8. Regulatory challenges
Regulatory challenges occur whenever a knowledge gap is found in an organization, such as
teammates do not aware of how ML algorithms work and create decisions. Hence, a lack of
knowledge to justify decisions to regulators can also be a major security risk for industries.
How can we assess Machine Learning Risks?
Machine learning is the hottest technology in the IT world. Although ML is being used in
every industry, it has some associated risks too. We can also access these risks when the ML
solution is implemented into your organization. Below are a few important steps to assess
machine learning risks in your organization. These are as follows:
o Implement a machine learning risk management framework instead of a general
framework to identify the risks in real-time scenarios.
o By providing training to employees for ML technologies and giving them the
knowledge to follow protocols for effective risk management in ML.
o By developing assessment criteria to identify and manage the risks in business, we
can assess the risks in business.
o ML Risk can also be assessed by adapting the risk monitoring process and risk
appetites regularly from past experience or feedback of customers.
Hence, machine learning risks can be identified and minimized through appropriate talent,
strategy and skilled resources throughout the organization.

10) What are Risk Management steps in Machine Learning? Explain?


Steps of the Risk Management Process
1. Identify the risk
2. Analyze the risk
3. Prioritize the risk
4. Treat the risk
5. Monitor the risk
What is the risk management process?
It's simply that: an ongoing process of identifying, treating, and then managing risks. Taking
the time to set up and implement a risk management process is like setting up a fire alarm––
you hope it never goes off, but you’re willing to deal with the minor inconvenience upfront
in exchange for protection down the road.
Identifying and tracking risks that might arise in a project offers significant benefits,
including:
 More efficient resource planning by making previously unforeseen costs visible
 Better tracking of project costs and more accurate estimates of return on investment
 Increased awareness of legal requirements
 Better prevention of physical injuries and illnesses
 Flexibility, rather than panic, when changes or challenges do arise
Risk Management Steps
Follow these risk management steps to improve your risk management process.
1. Identify the risk
Anticipating possible pitfalls of a project doesn't have to feel like gloom and doom for your
organization. Quite the opposite. Identifying risks is a positive experience that your whole
team can take part in and learn from.
Leverage the collective knowledge and experience of your entire team. Ask everyone to
identify risks they've either experienced before or may have additional insight about. This
process fosters communication and encourages cross-functional learning.
Use a risk breakdown structure to list out potential risks in a project and organize them
according to level of detail, with the most high-level risks at the top and more granular risks
at the bottom. This visual will help you and your team anticipate where risks might emerge
when creating tasks for a project.
Once you and your team have compiled possible issues, create a project risk log for clear,
concise tracking and monitoring of risks throughout a project.
A project risk log, also referred to as a project risk register, is an integral part of any effective
risk management process. As an ongoing database of each project’s potential risks, it not
only helps you manage current risks but serves as a reference point on past projects as well.
By outlining your risk register with the proper data points, you and your team can quickly
and correctly identify and assess possible threats to any project.
2. Analyze the risk
Once your team identifies possible problems, it's time to dig a little deeper. How likely are
these risks to occur? And if they do occur, what will the ramifications be?
During this step, your team will estimate the probability and fallout of each risk to decide
where to focus first. Factors such as potential financial loss to the organization, time lost,
and severity of impact all play a part in accurately analyzing each risk. By putting each risk
under the microscope, you’ll also uncover any common issues across a project and further
refine the risk management process for future projects.
3. Prioritize the risk
Now prioritization begins. Rank each risk by factoring in both its likelihood of happening and
its potential effect on the project.
This step gives you a holistic view of the project at hand and pinpoints where the team's
focus should lie. Most importantly, it’ll help you identify workable solutions for each risk.
This way, the project itself is not interrupted or delayed in significant ways during the
treatment stage.
4. Treat the risk
Once the worst risks come to light, dispatch your treatment plan. While you can’t anticipate
every risk, the previous steps of your risk management process should have you set up for
success. Starting with the highest priority risk first, task your team with either solving or at
least mitigating the risk so that it’s no longer a threat to the project.
Effectively treating and mitigating the risk also means using your team's resources efficiently
without derailing the project in the meantime. As time goes on and you build a larger
database of past projects and their risk logs, you can anticipate possible risks for a more
proactive rather than reactive approach for more effective treatment.
5. Monitor the risk
Clear communication among your team and stakeholders is essential when it comes to
ongoing monitoring of potential threats. And while it may feel like you're herding cats
sometimes, with your risk management process and its corresponding project risk register in
place, keeping tabs on those moving targets becomes anything but risky business.

11) Explain Sampling distribution of an Estimator?


What is an estimator?
In machine learning, an estimator is an equation for picking the “best,” or most likely
accurate, data model based upon observations in realty. Not to be confused with estimation
in general, the estimator is the formula that evaluates a given quantity (the estimand) and
generates an estimate. This estimate is then inserted into the deep learning classifier system
to determine what action to take.
Uses of Estimators
By quantifying guesses, estimators are how machine learning in theory is implemented in
practice. Without the ability to estimate the parameters of a dataset (such as the layers in
a neural network or the bandwidth in a kernel), there would be no way for an AI system to
“learn.”
A simple example of estimators and estimation in practice is the so-called “German Tank
Problem” from World War Two. The Allies had no way to know for sure how many tanks the
Germans were building every month. By counting the serial numbers of captured or
destroyed tanks (the estimand), Allied statisticians created an estimator rule. This equation
calculated the maximum possible number of tanks based upon the sequential serial
numbers, and apply minimum variance analysis to generate the most likely estimate for how
many new tanks German was building.
Types of Estimators
Estimators come in two broad categories—point and interval. Point equations generate
single value results, such as standard deviation, that can be plugged into a deep learning
algorithm’s classifier functions. Interval equations generate a range of likely values, such as
a confidence interval, for analysis.
In addition, each estimator rule can be tailored to generate different types of estimates:
 Biased - Either an overestimate or an underestimate.
 Efficient - Smallest variance analysis. The smallest possible variance is referred to as the
“best” estimate.
 Invariant: Less flexible estimates that aren’t easily changed by data transformations.
 Shrinkage: An unprocessed estimate that’s combined with other variables to create complex
estimates.
 Sufficient: Estimating the total population’s parameter from a limited dataset.
 Unbiased: An exact-match estimate value that neither underestimates nor overestimates.

A parameter is essentially a numerical characteristic of a distribution (or any statistical


model in general). Normal distributions have µ & σ as parameters, uniform distributions
have a & b as parameters, and binomial distributions have n & p as parameters. These
numerical characteristics are vital for understanding the size, shape, spread, and other
properties of a distribution. In the absence of the true value of the parameter, it seems that
the researcher may not be able to continue her investigation. But that’s when estimators
step in.
Estimators are functions of random variables that can help us find approximate values for
these parameters. Think of these estimators like any other function, that takes an input,
processes it, and renders an output. So, the process of estimation goes as follows:
1) From the distribution, we take a series of random samples.
2) We input these random samples into the estimator function.
3) The estimator function processes it and gives a set of outputs.
4) The expected value of that set is the approximate value of the parameter.
Example
Let’s take an example. Consider a random variable X showing a uniform distribution. The
distribution of X can be represented as U[0, θ]. This has been plotted below:
(Figure A)
We have the random variable X and its distribution. But we don’t know how to determine
the value of θ. Let’s use estimators. There are many ways to approach this problem. I’ll
discuss two of them:

1) Using Sample Mean


We know that for a U[a, b] distribution, the mean µ is given by the following equation:

For U[0, θ] distribution, a = 0 & b = θ, we get:

Thus, if we estimate µ, we can estimate θ. To estimate µ, we use a very popular estimator


called the sample mean estimator. The sample mean is the sum of the random sample value
drawn divided by the size of the sample. For instance, if we have a random sample S = {4, 7,
3, 2}, then the sample mean is (4+7+3+2)/4 = 4 (the average value). In general, the sample
mean is defined using the following notation:

Here, µ-hat is the sample mean estimator & n is the size of the random sample that we take
from the distribution. A variable with a hat on top of it is the general notation for an
estimator. Since our unknown parameter θ is twice of µ, we arrive at the following
estimator for θ:

We take a random sample, plug it into the above estimator, and get a number. We repeat
this process and get a set of numbers. The following figure illustrates the process:
(Figure B)
The lines on the x-axes correspond to the values present in the sample taken from the
distribution. The red lines in the middle indicate the average value of the sample, and the
red lines at the end are twice that average value i.e., the expected value of θ for one
sample. Many such samples are taken, and the estimated value of θ for each sample is
noted. The expected value/mean of that set of numbers gives the final estimate for θ. It can
be mathematically proved (using properties of expectation):

It is seen that the expectation of the estimator is equal to the true value of the parameter.
This amazing property that certain estimators have is called unbiasedness, which is a very
useful criterion for assessing estimators.

2) Maximum Value Method


This time, instead of using mean, we’ll use order statistics, particularly the nth order
statistic. The nth order statistic is defined as the nth smallest value of a random sample of
size n. In other words, it’s the maximum value of a random sample. For instance, if we have
a random sample S = {4, 7, 3, 2}, then the nth order statistic is 7 (the largest value). The
estimator is now defined as follows:

We follow the same procedure- take random samples, input them, collect the output and
find the expectation. The following figure illustrates the process:

(Figure C)
As noted previously, the lines on the x-axes are the values present in one sample. The red
lines at the end are the maximum value for that sample i.e., the nth order statistic. Two
random samples are shown for reference. However, we need to take much larger samples.
Why? To prove it, we’ll use the general expression for the PDF (Probability Distribution
Function) of nth order statistics for U[a, b] distribution:

For U[0, θ] distribution, a = 0 & b = θ, we get:

Using the integral form of expectation of a continuous variable,

Unlike before, no exact equality has been ascertained between the expectation of θ-hat & θ.
This is because the nth order statistic estimator is biased. We can observe this bias on
comparing figures B & C. In figure B, both positive (θ-hat > θ) and negative (θ-hat < θ)
deviations are possible with equal probability (as shown). These deviations can cancel each
other out, making the sample mean estimator unbiased. However, in figure C, only negative
deviations are possible. Why? Because the maximum value of a random sample will always
be lesser than the range [0, θ] of the distribution. Consequently, negative bias steps in.
Does that mean that we cannot use this estimator? Certainly not. As discussed earlier, the
estimator bias can be significantly lowered by taking large n. For large values of n, n = n+1
(approximately). Thus, we get:

12) What is Empirical Risk Minimization in machine learning?


Given a set of inputs and outputs, this loss function measures the difference between the
predicted output and the true output. But this is applicable only to the given set of inputs
and outputs. We want to know what the loss is over all the possibilities. This is where “true
risk” comes into picture.
True risk computes the average loss over all the possibilities. But the problem in the real
world is that we don’t know what “all the possibilities” would look like. In mathematical
terms, we say that we don’t know the true distribution over all the inputs and outputs. If we
did, then we wouldn’t need machine learning in the first place.
Give me an example
For example, let’s say you want to build a model that can differentiate between a male and
a female based on certain features. If we select 100 random people where men are really
short and women are really tall, then the model might incorrectly assume that height is the
differentiating feature. To build a truly accurate model, we need to gather all the men and
women in the world to extract the differentiating features. Unfortunately, that’s not
possible! So we select a small number of people and hope that this sample is representative
of the whole population.
What exactly is empirical risk minimization?
We assume that our samples come from this distribution and use our dataset as an
approximation. If you compute the loss using the data points in our dataset, it’s called
empirical risk. It’s “empirical” and not “true” because we are using a dataset that’s a subset
of the whole population.
When we build our learning model, we need to pick the function that minimizes the
empirical risk i.e. the delta between the predicted output and the actual output for the data
points in our dataset. This process of finding this function is called empirical risk
minimization. Ideally, we would like to minimize the true risk. But we don’t have the
information that allows us to achieve that, so our hope is that this empiricial risk will almost
be the same as the true empirical risk. Hence by minimizing it, we aim to minimize the true
risk.
What does it depend on?
The size of the dataset has a big impact on empirical risk minimization. If we get more data,
the empirical risk will approach the true risk. The complexity of the underlying distribution
affects how well we can approximate it. If it’s too complex, we would need more data to get
a good approximation. We should also be careful about the family of functions we consider.
If the size is too large, then the approximation error will be very high in certain situations.
The behavior of the loss function itself can impact it. If we are not careful in choosing this
function, then we might end up with very high loss values. L2 regularization is a very good
example of empirical risk minimization.

You might also like