UNIT-1 Material
UNIT-1 Material
1) Define Well-posed problems? Illustrate any four examples for Well-posed problems?
The formal definition of Well posed learning problem is, “A computer program is said
to learn from Experience E when given a task T, and some performance measure P. If
it performs on T with a performance measure P, then it upgrades with experience E.
To understand the topic better let’s have a look at a few classical examples,
To simplify,
T -> Play the checkers game.
P -> Percentage of games won against the opponent.
E -> playing practice games against itself.
Handwriting Recognition:
Handwriting recognition (HWR) is a technology that converts a user’s handwritten
letters or words into a computer-readable format (e.g., Unicode text).
Its applications are numerous, it is used in reading postal addresses, bank forms, etc.
T -> recognizing and classifying handwritten words from images.
P -> Percentage of correctly identified words.
E -> set of handwritten words with their classifications in a database.
With the use of sight scanners and advanced machine learning algorithms, it can be
made possible.
T –> To drive on public four-lane highways using sight scanners.
P -> the average distance progressed before an error.
E -> the order of images and steering instructions noted down while observing a
human driver.
A spam filtering for emails learning problem:
A spam filter is software that detects unsolicited and undesired email and prevents it
from reaching the inbox of a user.
It works by locating and measuring facial characteristics from a given image and is
often used to verify users through ID verification services.
Supervised Learning :
As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and
based on the training, the machine predicts the output. Here, the labelled data specifies
that some of the inputs are already mapped to the output. More preciously, we can say;
first, we train the machine with the input and corresponding output, and then we ask the
machine to predict the output using the test dataset.
Let's understand supervised learning with an example. Suppose we have an input dataset of
cats and dog images. So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height
(dogs are taller, cats are smaller), etc. After completion of training, we input the picture of
a cat and ask the machine to identify the object and predict the output. Now, the machine is
well trained, so it will check all the features of the object, such as height, shape, colour,
eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the
process of how the machine identifies the objects in Supervised Learning.
The main goal of the supervised learning technique is to map the input variable(x) with
the output variable(y). Some real-world applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc.
Categories of Supervised Machine Learning
Supervised machine learning can be classified into two types of problems, which are given
below:
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real-world
examples of classification algorithms are Spam Detection, Email filtering, etc.
Some popular classification algorithms are given below:
o Random Forest Algorithm
o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm
b) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.
Some popular Regression algorithms are given below:
o Simple Linear Regression Algorithm
o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression
Advantages and Disadvantages of Supervised Learning
Advantages:
o Since supervised learning work with the labelled dataset so we can have an exact
idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior
experience.
Disadvantages:
o These algorithms are not able to solve complex tasks.
o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.
Unsupervised Learning :
Un supervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabeled dataset, and the machine predicts the output
without any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines are
instructed to find the hidden patterns from the input dataset.
Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown to
the model, and the task of the machine is to find the patterns and categories of the objects.
So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.
In general, AI systems work by ingesting large amounts of labeled training data, analyzing
the data for correlations and patterns, and using these patterns to make predictions about
future states. In this way, a chatbot that is fed examples of text chats can learn to produce
lifelike exchanges with people, or an image recognition tool can learn to identify and
describe objects in images by reviewing millions of examples.
Learning processes. This aspect of AI programming focuses on acquiring data and creating
rules for how to turn the data into actionable information. The rules, which are
called algorithms, provide computing devices with step-by-step instructions for how to
complete a specific task.
Advantages
Good at detail-oriented jobs;
Reduced time for data-heavy tasks;
Delivers consistent results; and
AI-powered virtual agents are always available.
Disadvantages
Expensive;
Requires deep technical expertise;
Limited supply of qualified workers to build AI tools;
Only knows what it's been shown; and
Lack of ability to generalize from one task to another.
Strong AI vs. weak AI
AI can be categorized as either weak or strong.
Weak AI, also known as narrow AI, is an AI system that is designed and trained to
complete a specific task. Industrial robots and virtual personal assistants, such as Apple's
Siri, use weak AI.
Strong AI, also known as artificial general intelligence (AGI), describes programming that
can replicate the cognitive abilities of the human brain. When presented with an
unfamiliar task, a strong AI system can use fuzzy logic to apply knowledge from one
domain to another and find a solution autonomously. In theory, a strong AI program
should be able to pass both a Turing Test and the Chinese room test.
Deep Learning :
Deep learning is a subset of machine learning, which is essentially a neural network with
three or more layers. These neural networks attempt to simulate the behavior of the human
brain—albeit far from matching its ability—allowing it to “learn” from large amounts of
data. While a neural network with a single layer can still make approximate predictions,
additional hidden layers can help to optimize and refine for accuracy.
Deep learning drives many artificial intelligence (AI) applications and services that improve
automation, performing analytical and physical tasks without human intervention. Deep
learning technology lies behind everyday products and services (such as digital assistants,
voice-enabled TV remotes, and credit card fraud detection) as well as emerging technologies
(such as self-driving cars).
How deep learning works
Deep learning neural networks, or artificial neural networks, attempts to mimic the human
brain through a combination of data inputs, weights, and bias. These elements work
together to accurately recognize, classify, and describe objects within the data.
Deep neural networks consist of multiple layers of interconnected nodes, each building
upon the previous layer to refine and optimize the prediction or categorization. This
progression of computations through the network is called forward propagation. The input
and output layers of a deep neural network are called visible layers. The input layer is where
the deep learning model ingests the data for processing, and the output layer is where the
final prediction or classification is made.
Another process called backpropagation uses algorithms, like gradient descent, to calculate
errors in predictions and then adjusts the weights and biases of the function by moving
backwards through the layers in an effort to train the model. Together, forward propagation
and backpropagation allow a neural network to make predictions and correct for any errors
accordingly. Over time, the algorithm becomes gradually more accurate.
The above describes the simplest type of deep neural network in the simplest terms.
However, deep learning algorithms are incredibly complex, and there are different types of
neural networks to address specific problems or datasets. For example,
Convolutional neural networks (CNNs), used primarily in computer vision and image
classification applications, can detect features and patterns within an image,
enabling tasks, like object detection or recognition. In 2015, a CNN bested a human
in an object recognition challenge for the first time.
Recurrent neural network (RNNs) are typically used in natural language and speech
recognition applications as it leverages sequential or times series data.
Deep learning applications
Real-world deep learning applications are a part of our daily lives, but in most cases, they
are so well-integrated into products and services that users are unaware of the complex
data processing that is taking place in the background. Some of these examples include the
following:
Law enforcement
Deep learning algorithms can analyze and learn from transactional data to identify
dangerous patterns that indicate possible fraudulent or criminal activity. Speech
recognition, computer vision, and other deep learning applications can improve the
efficiency and effectiveness of investigative analysis by extracting patterns and evidence
from sound and video recordings, images, and documents, which helps law enforcement
analyze large amounts of data more quickly and accurately.
Financial services
Financial institutions regularly use predictive analytics to drive algorithmic trading of stocks,
assess business risks for loan approvals, detect fraud, and help manage credit and
investment portfolios for clients.
Customer service
Many organizations incorporate deep learning technology into their customer service
processes. Chatbots—used in a variety of applications, services, and customer service
portals—are a straightforward form of AI. Traditional chatbots use natural language and
even visual recognition, commonly found in call center-like menus. However,
more sophisticated chatbot solutions attempt to determine, through learning, if there are
multiple responses to ambiguous questions. Based on the responses it receives, the chatbot
then tries to answer these questions directly or route the conversation to a human user.
Virtual assistants like Apple's Siri, Amazon Alexa, or Google Assistant extends the idea of a
chatbot by enabling speech recognition functionality. This creates a new method to engage
users in a personalized way.
Healthcare
The healthcare industry has benefited greatly from deep learning capabilities ever since the
digitization of hospital records and images. Image recognition applications can support
medical imaging specialists and radiologists, helping them analyze and assess more images
in less time.
5) Differentiate between Supervised and Un-Supervised Learning Techniques?
6) Explain Testing and Training Loss with examples?
Training a model simply means learning (determining) good values for all the weights and
the bias from labeled examples. In supervised learning, a machine learning algorithm builds
a model by examining many examples and attempting to find a model that minimizes loss;
this process is called empirical risk minimization.
Loss is the penalty for a bad prediction. That is, loss is a number indicating how bad the
model's prediction was on a single example. If the model's prediction is perfect, the loss is
zero; otherwise, the loss is greater. The goal of training a model is to find a set of weights
and biases that have low loss, on average, across all examples. For example, Figure 3 shows
a high loss model on the left and a low loss model on the right. Note the following about the
figure:
Figure 3. High loss in the left model; low loss in the right model.
Notice that the arrows in the left plot are much longer than their counterparts in the right
plot. Clearly, the line in the right plot is a much better predictive model than the line in the
left plot.
You might be wondering whether you could create a mathematical function—a loss function
—that would aggregate the individual losses in a meaningful fashion.
The linear regression models we'll examine here use a loss function called squared loss (also
known as L2 loss). The squared loss for a single example is as follows:
= the square of the difference between the label and the prediction
= (observation - prediction(x))2
= (y - y')2
Mean square error (MSE) is the average squared loss per example over the whole dataset.
To calculate MSE, sum up all the squared losses for individual examples and then divide by
the number of examples:
MSE=1N∑(x,y)∈D(y−prediction(x))2
where:
Although MSE is commonly-used in machine learning, it is neither the only practical loss
function nor the best loss function for all circumstances.
7) Explain the Trade-Off in Machine Learning with examples?
In Machine Learning, the performance and complexity of the model not only depends on
certain parameters, assumptions and conditions. but also on the quality of data that is used
to train the model and that’s one of the steps that everyone goes through i.e. cleaning and
standardizing the data.
If the data is not cleaned and standardized then no matter how fine tune the model
parameters and hyper-parameters are, the model will not be able to provide the best
solution.
Let’s understand how data plays an important role in model’s performance and complexity, if
not given proper massaging to data is given and how it introduces bias and
variance(uncertainty) in the model.
Skewness in Data
In simple words, skewness is the measure of how much the probability distribution of a
random variable deviates from the normal distribution (probability distribution without any
skewness).
If our data is positively skewed, it means that it has a higher number of data points having
low values. So, when we train our model on this data, it will perform better at predicting data
points with lower values as compared to those with higher values. Thus, introducing bias in
our machine learning model. Also, skewness tells us about the direction of outliers. For
positively skewed data, most of the outliers are present on the right side of the distribution.
Variance — is a measure of how far the observed data points differ from the average value
i.e., their difference from the mean value. When we want to know how spread out the data
in our sample set is, we will calculate the variance. Variance introduces uncertainty in the
data i.e. greater the variance greater the uncertainty in the data.
Now, that we have understood how data plays an important role in model performance and
complexity, let’s now understand the bias-versus-variance trade off.
Bias-vs-Variance Trade-Off
It is one of the important concepts to understand for supervised machine learning and
predictive modeling use cases and the main goal is to choose a model to train that offers
lowest bias versus variance tradeoff for that dataset or business use case.
Bias indicates inaccuracy of the model prediction in comparison with the true value. It is due
to the erroneous/inaccurate assumption made in training process to simplify the model and
make the target function easier to learn.
Variance indicates the change in target function if different training data is used. It is caused
by modeling the noise present in the training data, which implies that the model is too
sensitive to the training data (thus giving different estimates when given new training data).
The table above gives a high-level overview of the difference between bias and variance, now
let’s discuss in depth and see how we can find a sweet spot.
Having fewer assumptions about the model training process can help generalize relevant
relations between features and target outputs. Low bias means fewer assumptions are made
about the target function, while high bias means more assumptions are made about the
target function. Having more assumptions can potentially miss important relations between
features and outputs and cause underfitting.
Underfitting refers to the situation in which models neither fit the training data nor
generalize to new
Low variance indicates changes in training data would result in similar target functions,
whereas, high variance indicates changes in training data would result in very different target
functions. High variance suggests that the algorithm learns the random noise instead of the
output and causes overfitting.
Overfitting refers to the situation in which models fit the training data very well but fails to
generalize to new data.
Generally, increasing model complexity would decrease bias error since the model has more
capacity to learn from the training data, but the variance error would increase, and the
model may begin to learn from noise in the training data.
The goal of training machine learning models is to achieve low bias and low variance.
The optimal model complexity is where bias error crosses with variance error.
The prediction error can be viewed as the sum of model error (error coming from the model)
and the irreducible error (coming from data collection).
prediction error = model error (Bias error² + variance error) + irreducible error
An optimal balance of bias and variance leads to a model that is neither overfit nor underfit.
This is the goal of predictive model — to isolate the signal from the dataset while ignoring
the noise.
Now, that we have understood the bias and variance error and how they cause underfitting
and overfitting let’s go through some of the steps that can be used to overcome these
problems.
· K-fold cross-validation: it split the initial training data into k subsets and train the model k
times. In each training, it uses one subset as the testing data and the rest as training data.
· Validation dataset: sample out a dataset from the initial training data to estimate how well
the model generalizes on new data.
· Simplify the model: for example, using fewer layers or less neurons to make the neural
network smaller.
· Use more data.
· Reduce Dimensionality in training data: its projects training data into a smaller dimension
to decrease the model complexity.
· Regularization: it tunes or selects preferred level of model complexity, so our model
becomes better at predicting (generalizing)
· Stop the training early when the performance on the testing dataset has not improved
after a number of training iterations.
In this article we learnt about bias and variance and how it underfits and overfits the model
respectively and see how data plays an important role in it. We understood how important
this tradeoff is, to achieve optimal model complexity and performance.
8) What is Under fitting and Over fitting of data explain?
In the real world, the dataset present will never be clean and perfect. It means each dataset
contains impurities, noisy data, outliers, missing data, or imbalanced data. Due to these
impurities, different problems occur that affect the accuracy and the performance of the
model. One of such problems is Overfitting in Machine Learning. Overfitting is a problem
that a model can exhibit.
A statistical model is said to be overfitted if it can’t generalize well with unseen data.
Before understanding overfitting, we need to know some basic terms, which are:
Noise: Noise is meaningless or irrelevant data present in the dataset. It affects the
performance of the model if it is not removed.
Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the
machine learning algorithms. Or it is the difference between the predicted values and the
actual values.
Variance: If the machine learning model performs well with the training dataset, but does
not perform well with the test dataset, then variance occurs.
Generalization: It shows how well a model is trained to predict unseen data.
What is Overfitting?
o Overfitting & underfitting are the two main errors/problems in the machine learning
model, which cause poor performance in Machine Learning.
o Overfitting occurs when the model fits more data than required, and it tries to
capture each and every datapoint fed to it. Hence it starts capturing noise and
inaccurate data from the dataset, which degrades the performance of the model.
o An overfitted model doesn't perform accurately with the test/unseen dataset and
can’t generalize well.
o An overfitted model is said to have low bias and high variance.
Example to Understand Overfitting
We can understand overfitting with a general example. Suppose there are three students, X,
Y, and Z, and all three are preparing for an exam. X has studied only three sections of the
book and left all other sections. Y has a good memory, hence memorized the whole book.
And the third student, Z, has studied and practiced all the questions. So, in the exam, X will
only be able to solve the questions if the exam has questions related to section 3. Student Y
will only be able to solve questions if they appear exactly the same as given in the book.
Student Z will be able to solve all the exam questions in a proper way.
The same happens with machine learning; if the algorithm learns from a small part of the
data, it is unable to capture the required data points and hence under fitted.
Suppose the model learns the training dataset, like the Y student. They perform very well on
the seen dataset but perform badly on unseen data or unknown instances. In such cases,
the model is said to be Overfitting.
And if the model performs well with the training dataset and also with the test/unseen
dataset, similar to student Z, it is said to be a good fit.
How to detect Overfitting?
Overfitting in the model can only be detected once you test the data. To detect the issue,
we can perform Train/test split.
In the train-test split of the dataset, we can divide our dataset into random test and training
datasets. We train the model with a training dataset which is about 80% of the total dataset.
After training the model, we test it with the test dataset, which is 20 % of the total dataset.
Now, if the model performs well with the training dataset but not with the test dataset,
then it is likely to have an overfitting issue.
For example, if the model shows 85% accuracy with training data and 50% accuracy with the
test dataset, it means the model is not performing well.
Ways to prevent the Overfitting
Although overfitting is an error in Machine learning which reduces the performance of the
model, however, we can prevent it in several ways. With the use of the linear model, we can
avoid overfitting; however, many real-world problems are non-linear ones. It is important to
prevent overfitting from the models. Below are several ways that can be used to prevent
overfitting:
1. Early Stopping
2. Train with more data
3. Feature Selection
4. Cross-Validation
5. Data Augmentation
6. Regularization
Early Stopping
In this technique, the training is paused before the model starts learning the noise within
the model. In this process, while training the model iteratively, measure the performance of
the model after each iteration. Continue up to a certain number of iterations until a new
iteration improves the performance of the model.
After that point, the model begins to over fit the training data; hence we need to stop the
process before the learner passes that point.
Stopping the training process before the model starts capturing noise from the data is
known as early stopping.
However, this technique may lead to the underfitting problem if training is paused too early.
So, it is very important to find that "sweet spot" between underfitting and overfitting.
Train with More data
Increasing the training set by including more data can enhance the accuracy of the model,
as it provides more chances to discover the relationship between input and output
variables.
It may not always work to prevent overfitting, but this way helps the algorithm to detect the
signal better to minimize the errors.
When a model is fed with more training data, it will be unable to overfit all the samples of
data and forced to generalize well.
But in some cases, the additional data may add more noise to the model; hence we need to
be sure that data is clean and free from in-consistencies before feeding it to the model.
Feature Selection
While building the ML model, we have a number of parameters or features that are used to
predict the outcome. However, sometimes some of these features are redundant or less
important for the prediction, and for this feature selection process is applied. In the feature
selection process, we identify the most important features within training data, and other
features are removed. Further, this process helps to simplify the model and reduces noise
from the data. Some algorithms have the auto-feature selection, and if not, then we can
manually perform this process.
Cross-Validation
Cross-validation is one of the powerful techniques to prevent overfitting.
In the general k-fold cross-validation technique, we divided the dataset into k-equal-sized
subsets of data; these subsets are known as folds.
Data Augmentation
Data Augmentation is a data analysis technique, which is an alternative to adding more data
to prevent overfitting. In this technique, instead of adding more training data, slightly
modified copies of already existing data are added to the dataset.
The data augmentation technique makes it possible to appear data sample slightly different
every time it is processed by the model. Hence each data set appears unique to the model
and prevents overfitting.
Regularization
If overfitting occurs when a model is complex, we can reduce the number of features.
However, overfitting may also occur with a simpler model, more specifically the Linear
model, and for such cases, regularization techniques are much helpful.
Regularization is the most popular technique to prevent overfitting. It is a group of methods
that forces the learning algorithms to make a model simpler. Applying the regularization
technique may slightly increase the bias but slightly reduces the variance. In this technique,
we modify the objective function by adding the penalizing term, which has a higher value
with a more complex model.
The two commonly used regularization techniques are L1 Regularization and L2
Regularization.
Ensemble Methods
In ensemble methods, prediction from different machine learning models is combined to
identify the most popular result.
The most commonly used ensemble methods are Bagging and Boosting.
In bagging, individual data points can be selected more than once. After the collection of
several sample datasets, these models are trained independently, and depending on the
type of task-i.e., regression or classification-the average of those predictions is used to
predict a more accurate result. Moreover, bagging reduces the chances of overfitting in
complex models.
In boosting, a large number of weak learners arranged in a sequence are trained in such a
way that each learner in the sequence learns from the mistakes of the learner before it. It
combines all the weak learners to come out with one strong learner. In addition, it improves
the predictive flexibility of simple models.
9) Explain in detail about Risk Statistics in statistics?
What are the risks of machine learning data?
Risks of Machine Learning
Nowadays, Machine Learning is playing a big role in helping organizations in different
aspects such as analysing structured and unstructured data, detecting risks, automating
manuals tasks, making data-driven decisions for business growth, etc. It is capable of
replacing the huge amount of human labour by applying automation and providing insights
to make better decisions for assessing, monitoring, and reducing the risks for an
organization.
Although machine learning can be used as a risk management tool, it also contains many
risks itself. While 49% of companies are exploring or planning to use machine learning, only
a small minority recognize the risks it poses. In which, only 41% of organizations in a global
McKinsey survey say they can comprehensively identify and prioritize machine learning
risks. Hence, it is necessary to be aware of some of the risks of machine learning-and how
they can be adequately evaluated and managed.
Below are a few risks associated with Machine Learning:
1. Poor Data
As we know, a machine learning model only works on the data that we provide to it, or we
can say it completely depends on human-given training data to work. What we will be input
that we will get as an output, so if we will enter the poor data, the ML model will generate
abrupt output. Poor data or dirty data includes errors in training data, outliers, and
unstructured data, which cannot be adequately interpreted by the model.
2. Overfitting
Overfitting is commonly found in non-parametric and non-linear models that are more
flexible to learn target function.
An overfitted model fits the training data so perfectly that it becomes unable to learn the
variability for the algorithm. It means it won't be able to generalize well when it comes to
testing real data.
3. Biased data
Biased data means that human biases can creep into your datasets and spoil outcomes. For
instance, the popular selfie editor FaceApp was initially inadvertently trained to make faces
"hotter" by lightening the skin tone-a result of having been fed a much larger quantity of
photos of people with lighter skin tones.
4. Lack of strategy and experience:
Machine learning is a very new technology in the IT sector; hence, less availability of trained
and skilled resources is a very big issue for the industries. Further, lack of strategy and
experience due to fewer resources leads to wastage of time and money as well as negatively
affect the organization's production and revenue. According to a survey of over 2000
people, 860 reported to lack of clear strategy and 840 were reported to lack of talent with
appropriate skill sets. This survey shows how lack of strategy and relevant experience
creates a barrier in the development of machine learning for organizations.
5. Security Risks
Security of data is one of the major issues for the IT world. Security also affects the
production and revenue of organizations. When it comes to machine learning, there are
various types of security risks exist that can compromise machine learning algorithms and
systems. Data scientists and machine learning experts have reported 3 types of attacks,
primarily for machine learning models. These are as follows:
o Evasion attacks:These attacks are commonly arisen due to adversarial input
introduced in the models; hence they are also known as adversarial attacks.
An evasion attack happens when the network uses adversarial examples as input
which can influence the classifiers, i.e., disrupting ML models. When a security
violation involves supplying malicious data that gets classified as genuine. A targeted
attack attempts to allow a specific intrusion or disruption, or alternatively to create
general mayhem.
Evasion attacks are the most dominant type of attack, where data is modified in a
way that it seems as genuine data. Evasion doesn't involve influence over the data
used to train a model, but it is comparable to the way spammers and hackers
obfuscate the content of spam emails and malware.
o Data Poisoning attacks:
In data poisoning attacks, the source of raw data is known, which is used to train the
ML models. Further, it strives to bias or "poison" the data to compromise the
resulting machine learning model's accuracy. The effects of these attacks can be
overcome by prevention and detection. Through proper monitoring, we can prevent
ML models from data poisoning.
Model skewing is one the most common type of data poisoning attacks in which
spammers categorise the classifiers with bad input as good.
o Model Stealing:
Model stealing is one of the most important security risks in machine learning.
Model stealing techniques are used to create a clone model based on information or
data used in the training of a base model. Why we are saying model stealing is a
major concern for ML experts because ML models are the valuable intellectual
property of organizations that consist of sensitive data of users such as account
details, transactions, financial information, etc. The attackers use public API and
sample data of the original model and reconstruct another model having a similar
look and feel.
6. Data privacy and confidentiality
Data is one of the main key players in developing Machine learning models. We know
machine learning requires a huge amount of structured and unstructured data for training
models so they can predict accurately in future. Hence, to achieve good results, we need to
secure data by defining some privacy terms and conditions as well as making it confidential.
Hackers can launch data extraction attacks that can fly under the radar, which can put your
entire machine learning system at risk.
7. Third-party risks
These types of security risks are not so famous in industries as there are very minimal
chances of these risks in industries. Third-party risks generally exist when someone
outsources their business to third-party service providers who may fail to properly govern a
machine learning solution. This leads to various types of data breaches in the ML industry.
8. Regulatory challenges
Regulatory challenges occur whenever a knowledge gap is found in an organization, such as
teammates do not aware of how ML algorithms work and create decisions. Hence, a lack of
knowledge to justify decisions to regulators can also be a major security risk for industries.
How can we assess Machine Learning Risks?
Machine learning is the hottest technology in the IT world. Although ML is being used in
every industry, it has some associated risks too. We can also access these risks when the ML
solution is implemented into your organization. Below are a few important steps to assess
machine learning risks in your organization. These are as follows:
o Implement a machine learning risk management framework instead of a general
framework to identify the risks in real-time scenarios.
o By providing training to employees for ML technologies and giving them the
knowledge to follow protocols for effective risk management in ML.
o By developing assessment criteria to identify and manage the risks in business, we
can assess the risks in business.
o ML Risk can also be assessed by adapting the risk monitoring process and risk
appetites regularly from past experience or feedback of customers.
Hence, machine learning risks can be identified and minimized through appropriate talent,
strategy and skilled resources throughout the organization.
Here, µ-hat is the sample mean estimator & n is the size of the random sample that we take
from the distribution. A variable with a hat on top of it is the general notation for an
estimator. Since our unknown parameter θ is twice of µ, we arrive at the following
estimator for θ:
We take a random sample, plug it into the above estimator, and get a number. We repeat
this process and get a set of numbers. The following figure illustrates the process:
(Figure B)
The lines on the x-axes correspond to the values present in the sample taken from the
distribution. The red lines in the middle indicate the average value of the sample, and the
red lines at the end are twice that average value i.e., the expected value of θ for one
sample. Many such samples are taken, and the estimated value of θ for each sample is
noted. The expected value/mean of that set of numbers gives the final estimate for θ. It can
be mathematically proved (using properties of expectation):
It is seen that the expectation of the estimator is equal to the true value of the parameter.
This amazing property that certain estimators have is called unbiasedness, which is a very
useful criterion for assessing estimators.
We follow the same procedure- take random samples, input them, collect the output and
find the expectation. The following figure illustrates the process:
(Figure C)
As noted previously, the lines on the x-axes are the values present in one sample. The red
lines at the end are the maximum value for that sample i.e., the nth order statistic. Two
random samples are shown for reference. However, we need to take much larger samples.
Why? To prove it, we’ll use the general expression for the PDF (Probability Distribution
Function) of nth order statistics for U[a, b] distribution:
Unlike before, no exact equality has been ascertained between the expectation of θ-hat & θ.
This is because the nth order statistic estimator is biased. We can observe this bias on
comparing figures B & C. In figure B, both positive (θ-hat > θ) and negative (θ-hat < θ)
deviations are possible with equal probability (as shown). These deviations can cancel each
other out, making the sample mean estimator unbiased. However, in figure C, only negative
deviations are possible. Why? Because the maximum value of a random sample will always
be lesser than the range [0, θ] of the distribution. Consequently, negative bias steps in.
Does that mean that we cannot use this estimator? Certainly not. As discussed earlier, the
estimator bias can be significantly lowered by taking large n. For large values of n, n = n+1
(approximately). Thus, we get: