0% found this document useful (0 votes)
62 views47 pages

1.unit 1 ML Q&A

Uploaded by

shaik amreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views47 pages

1.unit 1 ML Q&A

Uploaded by

shaik amreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Unit-1

Syllabus:

Unit I: Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of Machine
Learning Systems, Main Challenges of Machine Learning.

Statistical Learning: Introduction, Supervised and Unsupervised Learning, Training and Test Loss,
Tradeoffs in Statistical Learning, Estimating Risk Statistics, Sampling distribution of an estimator,
Empirical Risk Minimization.

Machine Learning, Types of Machine Learning Systems

1. What is Machine Learning? Explain any four applications with an example. [7M] July –
2023 Set -2 [Remember]

ARTIFICIAL INTELIGENCE:

Artificial intelligence, commonly referred to as AI, is the process of imparting data, information, and
human intelligence to machines. The main goal of Artificial Intelligence is to develop self-reliant
machines that can think and act like humans.

These machines can mimic human behavior and perform tasks by learning and problem-solving.
Most of the AI systems simulate natural intelligence to solve complex problems.

Let’s have a look at an example of an AI-driven product - Amazon Echo.

MACHINE LEARNING:

Machine Learning is a discipline of computer science that uses computer algorithms and analytics to
build predictive models that can solve business problems. As per McKinsey & Co., machine
learning is based on algorithms that can learn from data without relying on rules-based
programming.

1
Tom Mitchell’s book on machine learning says “A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.”

(or)

“Machine learning enables a machine to automatically learn from data, improve performance from
experiences, and predict things without being explicitly programmed.”

importance of Machine Learning:

o Rapid increment in the production of data

o Solving complex problems, which are difficult for a human

o Decision making in various sector including finance

o Finding hidden patterns and extracting useful information from data

Popular Machine Learning Applications and Examples

1. Social Media Features

Social media platforms use machine learning algorithms and approaches to create some attractive
and excellent features. For instance, Facebook notices and records your activities, chats, likes, and
comments, and the time you spend on specific kinds of posts. Machine learning learns from your
own experience and makes friends and page suggestions for your profile.

2. Product Recommendations

Product recommendation is one of the most popular and known applications of machine learning.
Product recommendation is one of the stark features of almost every e-commerce website today,
which is an advanced application of machine learning techniques. Using machine learning and AI,
websites track your behavior based on your previous purchases, searching patterns, and cart history,
and then make product recommendations.

2
3. Image Recognition

Image recognition, which is an approach for cataloging and detecting a feature or an object in the
digital image, is one of the most significant and notable machine learning and AI techniques. This
technique is being adopted for further analysis, such as pattern recognition, face detection, and face
recognition.

4. Sentiment Analysis

Sentiment analysis is one of the most necessary applications of machine learning. Sentiment
analysis is a real-time machine learning application that determines the emotion or opinion of the
speaker or the writer. For instance, if someone has written a review or email (or any form of a
document), a sentiment analyzer will instantly find out the actual thought and tone of the text. This
sentiment analysis application can be used to analyze a review based website, decision-making
applications, etc.
3
5. Automating Employee Access Control

Organizations are actively implementing machine learning algorithms to determine the level of
access employees would need in various areas, depending on their job profiles. This is one of the
coolest applications of machine learning.

6. Marine Wildlife Preservation

Machine learning algorithms are used to develop behavior models for endangered cetaceans and
other marine species, helping scientists regulate and monitor their populations.

7. Regulating Healthcare Efficiency and Medical Services

Significant healthcare sectors are actively looking at using machine learning algorithms to manage
better. They predict the waiting times of patients in the emergency waiting rooms across various
departments of hospitals. The models use vital factors that help define the algorithm, details of staff
at various times of day, records of patients, and complete logs of department chats and the layout of
emergency rooms. Machine learning algorithms also come to play when detecting a disease, therapy
planning, and prediction of the disease situation. This is one of the most necessary machine learning
applications.

8. Predict Potential Heart Failure

An algorithm designed to scan a doctor’s free-form e-notes and identify patterns in a patient’s
cardiovascular history is making waves in medicine. Instead of a physician digging through multiple
health records to arrive at a sound diagnosis, redundancy is now reduced with computers making an
analysis based on available information.

9. Banking Domain

Banks are now using the latest advanced technology machine learning has to offer to help prevent
fraud and protect accounts from hackers. The algorithms determine what factors to consider to
create a filter to keep harm at bay. Various sites that are unauthentic will be automatically filtered
out and restricted from initiating transactions.

4
10. Language Translation

One of the most common machine learning applications is language translation. Machine learning
plays a significant role in the translation of one language to another. We are amazed at how
websites can translate from one language to another effortlessly and give contextual meaning as
well. The technology behind the translation tool is called ‘machine translation.’ It has enabled
people to interact with others from all around the world; without it, life would not be as easy as it is
now. It has provided confidence to travelers and business associates to safely venture into foreign
lands with the conviction that language will no longer be a barrier.

Popular machine learning applications and technology are evolving at a rapid pace, and we are
excited about the possibilities that our AI Course has to offer in the days to come. As the demand
for AI and machine learning has increased, organizations require professionals with in-and-out
knowledge of these growing technologies and hands-on experience.

DEEP LEARNING:

Deep learning is a subset of machine learning (ML), which is


itself a subset of artificial intelligence (AI). The concept of AI
has been around since the 1950s, with the goal of making
computers able to think and reason in a way similar to humans.
As part of making machines able to think, ML is focused on how
to make them learn without being explicitly programmed. Deep
learning goes beyond ML by creating more complex hierarchical
models that are meant to mimic how humans learn new
information

TYPES OF MECHINE LEARNING:

Machine learning is a subset of AI, which enables the machine to automatically learn from data,
improve performance from past experiences, and make predictions. Machine learning contains a set
of algorithms that work on a huge amount of data. Data is fed to these algorithms to train them, and

5
on the basis of training, they build the model & perform a specific task

Machine learning algorithms are classified into three main categories:

1. Supervised Learning:

Supervised learning is a type of machine learning method in which we provide sample labeled data to
the machine learning system in order to train it, and on that basis, it predicts the output. The system
creates a model using labeled data to understand the datasets and learn about each data, once the
training and processing are done then we test the model by providing a sample data to check whether
it is predicting the exact output or not.

The goal of supervised learning is to map input data with the output data.The supervised learning is
based on supervision, and it is the same as when a student learns things in the supervision of the
teacher. The example of supervised learning is spam filtering.

 Supervised learning is a process of providing input data as well as correctoutput data to the
machine learning model.
 The aim of a supervised learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y).
 In the real-world, supervised learning can be used for Risk Assessment,Image classification,
Fraud Detection, spam filtering, etc

Supervised learning can be grouped further in two categories of algorithms:

I. Classification -- Classification algorithms are used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-Female, True-false, Spam Filtering, etc.

o Random Forest

o Decision Trees

o Logistic Regression

o Support vector Machines

6
II.Regression -- Regression algorithms are used if there is a relationship between the input variable
and the output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which come under
supervised learning:

o Linear Regression

o Regression Trees

o Non-Linear Regression

o Bayesian Linear Regression

o Polynomial Regression

Below is an example of a supervised learning method. The algorithm is trained using labeled data of
dogs and cats. The trained model predicts whether the new image is that of a cat or a dog

 Some examples of supervised learning include linear regression, logistic regression, support
vector machines, Naive Bayes, and decision tree.

Advantages of Supervised learning:

 With the help of supervised learning, the model can predict the output on the basis of prior
experiences.
 In supervised learning, we can have an exact idea about the classes of objects.
 Supervised learning model helps us to solve various real-world problems such as fraud detection,
spam filtering, etc.

7
Disadvantages of supervised learning:

 Supervised learning models are not suitable for handling the complex tasks.
 Supervised learning cannot predict the correct output if the test data is different from the training
datasets.
 Training required lots of computation times.
 In supervised learning, we need enough knowledge about the classes of object.

1. Unsupervised Learning

Unsupervised learning algorithms employ unlabeled data to discover patterns from the data on their
own. The systems are able to identify hidden features from the input data provided. Once the data is
more readable, the patterns and similarities become more evident.

(or)

“ Unsupervised learning is a type of machine learning in which models are trained using unlabeled
datasets and are allowed to act on that data without any supervision “

Below is an example of an unsupervised learning method that trains a model using unlabeled data. In
this case, the data consists of different vehicles. The purpose of the model is to classify each kind of
vehicle.

(or)

It can be further classifieds into two categories of algorithms:

8
o Clustering -- Clustering is a method of grouping the objects into clusters such that objects with
most similarities remains into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes them as per
the presence and absence of those commonalities.

o Association -- An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs
together in the dataset. Association rule makes marketing strategy more effective. Such as people
who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example
of Association rule is Market Basket Analysis.

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition

Advantages of Unsupervised Learning

 Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
 Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled
data.

Disadvantages of Unsupervised Learning

 Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
 The result of the unsupervised learning algorithm might be less accurate as input data is not
labeled, and algorithms do not know the exact output in advance.

3. Reinforcement Learning

The goal of reinforcement learning is to train an agent to complete a task within an uncertain
environment. The agent receives observations and a reward from the environment and sends actions
to the environment. The reward measures how successful action is with respect to completing the
task goal.

Below is an example that shows how a machine is trained to identify shapes.

9
 Examples of reinforcement learning algorithms include Q-learning and Deep Q-learning Neural
Networks.

Deep Learning

Deep learning is a subset of machine learning that deals with algorithms inspired by the structure and
function of the human brain. Deep learning algorithms can work with an enormous amount of both
structured and unstructured data. Deep learning’s core concept lies in artificial neural networks,
which enable machines to make decisions.

The major difference between deep learning vs machine learning is the way data is presented to the
machine. Machine learning algorithms usually require structured data, whereas deep learning
networks work on multiple layers of artificial neural networks.This is what a simple neural network
looks like:

The network has an input layer that accepts inputs from the data. The hidden layer is used to find any
hidden features from the data. The output layer then provides the expected output.

Here is an example of a neural network that uses large sets of unlabeled data of eye retinas. The
network model is trained on this data to find out whether or not a person has diabetic retinopathy.

Now that we have an idea of what deep learning is, let’s see how it works.

10
2. Explain the process of Machine Learning step by step. [7M] July – 2023 Set 4[Understand]

Major Steps in the Machine Learning Process:

An application of AI called machine learning allows systems to learn from the past performance
without explicitly being programmed. Machine learning aims to create computer programs that can
access data and use it to acquire knowledge on their own. Machine learning relies on input, such as
training data or knowledge graphs, to comprehend things, domains, and the connections between
them, much to how the human brain acquires information and understanding.

Machine learning professionals follow a standard methodology to complete tasks, regardless of the
model or training technique used. These actions involve iteration. This means that you evaluate how
the process progresses at each stage. Are things going as you anticipated? If not, go back and review
your previous steps or current step to try to figure out where the breakdown occurred.

The task of imparting intelligence to machines seems daunting and impossible. But it is easy. It can
be broken down into five significant steps :

1. Define the problem

2. Build the dataset

3. Train the model

4. Evaluate the model

5. Inference(Implementing the model)

1. Define the problem

The first step in the machine learning process is defining the problem. When approaching any
problem through machine learning, it is always necessary to be specific about the area you will focus
on.

For example, if you want to analyze the process to increase sales. You cant take the entire thing as
your problem. You are specific such as "Does adding a $1.00 charge for a special add-on increase the
sales of that product?". When you define the problem in such a way, it will be easy for you to choose
the machine learning task and the nature of the data needed to build the model.

2. Build a Dataset

11
Building a dataset that can be utilized to address your machine learning-based challenge is the
following step in the machine learning process. Understanding the necessary data enables you to
choose superior models and algorithms, resulting in the development of more efficient solutions.
Working with information is perhaps the most overlooked — yet most important — step of the
machine learning process.

The Four Aspects of Working with Data

Data collection

Gathering data for your project can be as simple as running the appropriate SQL queries or as
complex as developing custom web scraper applications. You may even need to run a model on your
data to obtain the required labels.

Data inspection

The most crucial component that will eventually influence how well you anticipate your model to
perform is the quality of your data. As you inspect your data, look for:

 Outliers

 Missing or incomplete values

 Data that needs to be transformed or preprocessed

So it's in the correct format to be used by your model

Summary statistics

A subset of descriptive statistics called summary statistics gives an overview of the information
about the sample data. It is the goal of summary statistics to summarize statistical data. This shows
that summary statistics can be effectively used to quickly grasp the essence of the data. Statistics
typically deals with the quantitative or visual display of information.

Data visualization

To help people comprehend and make sense of massive volumes of data, data visualization is a
technique that uses a variety of static and dynamic visualizations within a given context. The data is
sometimes presented in a story format to visualize patterns, trends, and connections that could
otherwise go missing.

3. Model Training
12
After we prepare the data, the next step of our process is to do the model training using the data we
have prepared. We will split the data into two major categories as the initial step.

Splitting your dataset gives you two sets of data:

 Training dataset: The data on which the model will be trained. Most of your data will be here.
Many developers estimate about 80%.

 Test dataset: The data withheld from the model during training is used to test how well your
model will generalize to new data.

After splitting, we can use the dataset to train the model we prefer. Use machine learning
frameworks that provide model implementations and model training algorithms that are currently
operational. Unless you're creating new models or algorithms, you usually won't need to
implement these from scratch.

Pick a model or some models using a method known as model selection. Even seasoned machine
learning practitioners may experiment with many different models while using machine learning
to solve problems because the number of recognized models continually expands.
Hyperparameters are model settings that are left unchanged throughout training but may impact
how quickly or accurately the model learns, such as the number of clusters it should be able to
recognize.

The end-to-end training process is

 Feed the training data into the model.


 Compute the loss function on the results.
 Update the model parameters in a direction that reduces loss.

13
You continue to cycle through these steps until you reach a predefined stop condition. This might
be based on a training time, the number of training cycles, or an even more intelligent or
application-aware mechanism.

4. Model Evaluation

You can assess how well your model functions once you have gathered your data, trained a
model, and then used it. The parameters employed for review are probably quite particular to
your identified issue. You will be able to investigate a wide range of indicators that can help you
evaluate effectively as your knowledge of machine learning increases. There are many evaluation
matrices on which we can decide the model's performance. Those are such as

 Accuracy
 Specificity
 Recall or Sensitivity
 F1 Score
 Precision

5. Model Inference

You are prepared to make predictions about problems in the real world using data that has not yet
been observed in the field once you have trained your model, assessed its efficacy, and are
satisfied with the results. This procedure is frequently referred to as inference in machine
learning. Using a trained model to infer conclusions from current data is known as model
inference. The model inference is merely the processing of omitted data using the trained model
to create a result, albeit the results may be monitored for future optimization. Even after your
model has been deployed, you keep an eye on it to ensure it is generating the outcomes you are
looking for. You might need to reexamine the data, adjust a few settings in your model training
procedure, or switch the trained model type.

Remember that this process is Iterative.

14
Each step has been highly iterative and is subject to alteration or re-scoping as a project progresses.
You could discover that you need to go back and review some assumptions you made in earlier steps
at each level. This uncertainty is expected. When the evaluation is not as expected, it's okay, and we
can go back to do the alterations and re-train. Then we iterate the process repeatedly to achieve the
required precision and accuracy.

3. Compare and contrast Instance-Based and Model-Based Learning. [7M] July – 2023 Set
- 4[Evaluate]

Instance-Based Versus Model-Based Learning

One more way to categorize Machine Learning systems is by how they generalize. Most Machine
Learning tasks are about making predictions. This means that given a number of training examples,
the system needs to be able to make good predictions for (generalize to) examples it has never seen
before. Having a good performance measure on the training data is good, but insufficient; the true
goal is to perform well on new instances.

There are two main approaches to generalization: instance-based learning and model-based learning.

Instance-based learning

Possibly the most trivial form of learning is simply to learn by heart. If you were to create a spam
filter this way, it would just flag all emails that are identical to emails that have already been flagged
by users—not the worst solution, but certainly not the best.

Instead of just flagging emails that are identical to known spam emails, your spam filter could be
programmed to also flag emails that are very similar to known spam emails. This requires a measure
of similarity between two emails. A (very basic) similarity measure between two emails could be to
count the number of words they have in common. The system would flag an email as spam if it has
many words in common with a known spam email.

This is called instance-based learning: the system learns the examples by heart, then generalizes to
new cases by using a similarity measure to compare them to the learned examples (or a subset of
them). For example, in the figure below the new instance would be classified as a triangle because
the majority of the most similar instances belong to that class.

15
Model-based learning

Another way to generalize from a set of examples is to build a model of these examples and then use
that model to make predictions. This is called model-based learning (the figure below).

Instance-based learning is like asking your friends for advice on a problem you’re facing. You think
about all your friends and compare their advice to find the one that’s most similar to your problem.
Once you find the most similar advice, you apply it to your problem.

Similarly, in instance-based learning, the algorithm looks at the training dataset and finds the
instances that are most similar to the new input. It then uses the output of those instances to make a
prediction for the new input.

16
One common example of instance-based learning is the k-nearest neighbor (KNN) algorithm, where
the algorithm finds the k-nearest instances in the training dataset to the new input and uses their
output to make a prediction for the new input.

Recommender systems ,that suggest movies or products based on similarity to other items that the
user has previously rated or interacted with, are the example of instance based learning.

In contrast, model-based learning is like learning a set of rules from a teacher’s explanation. You
listen to the teacher explain a concept and learn the rules to apply that concept to different situations.
You can then use those rules to solve new problems that are similar to the original concept.

Similarly, in model-based learning, the algorithm learns a mathematical model from the training data
that captures the underlying relationships between the input variables and the output variable. The
model can then be used to make predictions on new, unseen data.

Model-based learning algorithms include regression models, decision trees, neural networks, and
support vector machines, among others.

Image classification models ,that learn to identify objects in an image based on the patterns and
features that are present, are example of model based learning.

So, in summary, instance-based learning finds the most similar instances in the training dataset to
the new input, while model-based learning learns a mathematical model from the training data to
make predictions on new, unseen data.

Differences between Instance-based & Model-based Learning

Instance-based learning and model-based learning are two broad categories of machine learning
algorithms. There are several key differences between these two types of algorithms, including:

1. Generalization: In model-based learning, the goal is to learn a generalizable model that can
be used to make predictions on new data. This means that the model is trained on a dataset
and then tested on a separate, unseen dataset to evaluate its performance. In
contrast, instance-based learning algorithms simply memorize the training examples and use
them to make predictions on new data. This means that instance-based learning algorithms
don’t try to learn a generalizable model, and their performance on new data is not as reliable
as model-based algorithms.
2. Scalability: Because instance-based learning algorithms simply memorize the training
examples, they can be very slow and memory-intensive when working with large datasets.
This is because the model has to store all of the training examples in memory and compare
new data points to each of the stored examples. In contrast, model-based learning algorithms
can be more scalable because they don’t have to store all of the training examples. Instead,
they learn a model that can be used to make predictions without storing the training data.
3. Interpretability: Model-based learning algorithms often produce models that are easier to
interpret than instance-based learning algorithms. This is because the model-based algorithms
learn a set of rules or parameters that can be inspected to understand how the model is
making predictions. In contrast, instance-based learning algorithms simply store the training
examples and use them as a basis for making predictions, which can make it difficult to
understand how the predictions are being made.

17
Overall, while instance-based learning algorithms can be effective for small or medium-sized
datasets, they are generally not as scalable or interpretable as model-based learning algorithms.
Therefore, model-based learning is often preferred for larger, more complex datasets.

4. What is Batch and online learning system? Explain. [7M] July – 2023 Set -3[Remember]

While designing an ML-based application, one of the important things to choose would be how
models in production learn incrementally from the stream of incoming data or otherwise. There are
mainly two approaches to how ML applications can learn in production, Online and Batch. Machine
learning architects need to develop a good understanding of when to choose which approach.

Batch Learning

Although typical machine learning is done offline using the batch learning method, online learning
does have its applications.

During batch learning, data is gathered over time. The machine learning model is then periodically
trained using this accumulated data in batches. Because the model is unable to learn progressively
from a stream of real-time data, it is the exact reverse of online learning. In batch learning, the
machine learning algorithm does not modify its parameters until batches of fresh data have been
consumed.

Large batches of accumulated data are used to train models, which requires more time and resources
like CPU, memory, and disc input/output. Additionally, it requires more time to deploy models into
production because this can only be done periodically depending on how well the model performs
after being trained with fresh data.

A model that was learned using batch learning must be retrained using the fresh dataset if it has to
learn about new data.

Online Learning

Online machine learning is a sort of machine learning in which the best predictor for future data is
updated at each step using data that is received sequentially.

I‍n online machine learning the best prediction model for future data is updated continuously and
sequentially, as new data keeps arriving. Thus every time new data arrives, the model parameters get
updated based on the new data. At each stage the training is quite fast and cheap, also the model is
always up to date because parameters associated with the model adjust themselves based on the new
data.

The application of online learning might be in stock market prediction or weather forecasting. Also,
if computational resources are a concern, you can go for online learning. Online machine learning is
also a good choice in scenarios when a model has to learn from feedback. Online learning also saves
storage space, because you keep discarding the data from which it has learned already.
18
Batch vs Online Learning

Python Libraries for Online Learning

Scikit-Multiflow

scikit-multiflow is an open-source machine learning package for streaming data. It extends the
scientific tools available in the Python ecosystem. scikit-multiflow is intended for streaming data
applications where data is continuously generated and must be processed and analyzed on the go.
Data samples are not stored, so learning methods are exposed to new data only once.

19
The (theoretical) infinite nature of the data stream poses additional challenges. While data in
unbounded, resources such as memory and time are limited, therefore stream learning methods must
be efficient. Additionally, dynamic environments imply that data can change over time. The change
in the distribution of data is known as concept drift and can lead to model performance degradation if
not handled properly. Drift-aware stream learning methods are especially designed to be robust
against this phenomenon.

Jubatus

Jubatus is a distributed processing framework and streaming machine learning library. Jubatus
includes these functionalities:

 Online Machine Learning Library: Classification, Regression, Recommendation (Nearest


Neighbor Search), Graph Mining, Anomaly Detection, Clustering

 Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction

 Framework for Distributed Online Machine Learning with Fault Tolerance

River

The river is a machine learning library for continuous learning and dynamic data streams. For various
stream learning challenges, it includes many state-of-the-art learning methods, data
generators/transformers, performance indicators, and evaluators. It’s the outcome of combining two
of Python’s most popular stream learning packages: Creme and scikit-multiflow.

In river, machine learning models are extended classes of specialized mixins that vary based on the
learning job, such as classification, regression, clustering, and so on. This maintains library
compatibility and makes it easier to extend/modify current models as well as create new models that
are compatible with the river.

Learn and predict are the two main functions of all predictive models. The learn one method is used
for learning (updates the internal state of the model). The predict one (classification, regression, and
clustering), predict proba one (classification), and score one (anomaly detection) algorithms provide
predictions depending on the learning goal. It’s worth noting that the river includes transformers,
which are stateful objects that use the transform one method to convert an input.

Artificial Intelligence, Machine Learning, Deep learning

5. Write the differences between Artificial Intelligence, Machine Learning and Deep
Learning.

[7M] July – 2023 Set -2 [Apply]

Artificial Intelligence Machine Learning Deep Learning


20
Artificial Intelligence Machine Learning Deep Learning

AI stands for Artificial DL stands for Deep Learning,


ML stands for Machine
Intelligence, and is basically and is the study that makes use
Learning, and is the study that
the study/process which of Neural Networks(similar to
uses statistical methods
enables machines to mimic neurons present in human brain)
enabling machines to improve
human behaviour through to imitate functionality just like a
with experience.
particular algorithm. human brain.

AI is the broader family


consisting of ML and DL as ML is the subset of AI. DL is the subset of ML.
it’s components.

DL is a ML algorithm that uses


AI is a computer algorithm ML is an AI algorithm which
deep(more than one layer) neural
which exhibits intelligence allows system to learn from
networks to analyze data and
through decision making. data.
provide output accordingly.

If you have a clear idea about If you are clear about the math
the logic(math) involved in involved in it but don’t have idea
Search Trees and much behind and you can visualize about the features, so you break
complex math is involved in the complex functionalities like the complex functionalities into
AI. K-Mean, Support Vector linear/lower dimension features
Machines, etc., then it defines by adding more layers, then it
the ML aspect. defines the DL aspect.

It attains the highest rank in


The aim is to basically The aim is to increase accuracy
terms of accuracy when it is
increase chances of success not caring much about the
trained with large amount of
and not accuracy. success ratio.
data.

Three broad DL can be considered as neural


categories/types Of AI are: networks with a large number of
Three broad categories/types
Artificial Narrow parameters layers lying in one of
Of ML are: Supervised
Intelligence (ANI), the four fundamental network
Learning, Unsupervised
Artificial General architectures: Unsupervised Pre-
Learning and Reinforcement
Intelligence (AGI) and trained Networks, Convolutional
Learning
Artificial Super Intelligence Neural Networks, Recurrent
(ASI) Neural Networks and Recursive

Less efficient than DL as it


More powerful than ML as it can
The efficiency Of AI is can’t work for longer
easily work for larger sets of
basically the efficiency dimensions or higher amount of
data.
provided by ML and DL data.
respectively.

21
Artificial Intelligence Machine Learning Deep Learning

Examples of AI applications
include: Google’s AI- Examples of ML applications
Examples of DL applications
Powered Predictions, include: Virtual Personal
include: Sentiment based news
Ridesharing Apps Like Uber Assistants: Siri, Alexa, Google,
aggregation, Image analysis and
and Lyft, Commercial etc., Email Spam and Malware
caption generation, etc.
Flights Use an AI Autopilot, Filtering.
etc.

AI refers to the broad field


of computer science that
ML is a subset of AI that
focuses on creating DL is a subset of ML that
focuses on developing
intelligent machines that can focuses on developing deep
algorithms that can learn from
perform tasks that would neural networks that can
data and improve their
normally require human automatically learn and extract
performance over time without
intelligence, such as features from data.
being explicitly programmed.
reasoning, perception, and
decision-making.

ML algorithms can be
categorized as supervised,
unsupervised, or reinforcement
AI can be further broken DL algorithms are inspired by
learning. In supervised
down into various subfields the structure and function of the
learning, the algorithm is
such as robotics, natural human brain, and they are
trained on labeled data, where
language processing, particularly well-suited to tasks
the desired output is known. In
computer vision, expert such as image and speech
unsupervised learning, the
systems, and more. recognition.
algorithm is trained on
unlabeled data, where the
desired output is unknown.

DL networks consist of multiple


In reinforcement learning, the
layers of interconnected neurons
AI systems can be rule- algorithm learns by trial and
that process data in a hierarchical
based, knowledge-based, or error, receiving feedback in the
manner, allowing them to learn
data-driven. form of rewards or
increasingly complex
punishments.
representations of the data.

Main Challenges of Machine Learning

6. Explain four of the main challenges in Machine Learning? [7M] July – 2023 Set -
1[Understand]

MAIN CHALLENGES OF MACHINE LEARNING:

22
 Not enough training data.
 Poor Quality of data.
 Irrelevant features.
 Nonrepresentative training data.
 Overfitting and Underfitting.

1. Not enough training data :

Let’s say for a child, to make him learn what an apple is, all it takes for you to point to an apple and
say apple repeatedly. Now the child can recognize all sorts of apples.

Well, machine learning is still not up to that level yet; it takes a lot of data for most of the algorithms
to function properly. For a simple task, it needs thousands of examples to make something out of it,
and for advanced tasks like image or speech recognition, it may need lakhs(millions) of examples.

2. Poor Quality of data:

Obviously, if your training data has lots of errors, outliers, and noise, it will make it impossible for
your machine learning model to detect a proper underlying pattern. Hence, it will not perform well.

So put in every ounce of effort in cleaning up your training data. No matter how good you are in
selecting and hyper tuning the model, this part plays a major role in helping us make an accurate
machine learning model.

“Most Data Scientists spend a significant part of their time in cleaning data”.

There are a couple of examples when you’d want to clean up the data :

 If you see some of the instances are clear outliers just discard them or fix them manually.
 If some of the instances are missing a feature like (E.g., 2% of user did not specify their age),
you can either ignore these instances, or fill the missing values by median age, or train one model
with the feature and train one without it to come up with a conclusion.

3. Irrelevant Features:

“Garbage in, garbage out (GIGO).”

23
Image Source

In the above image, we can see that even if our model is “AWESOME” and we feed it with garbage
data, the result will also be garbage(output). Our training data must always contain more
relevant and less to none irrelevant features.

The credit for a successful machine learning project goes to coming up with a good set of features on
which it has been trained (often referred to as feature engineering ), which includes feature selection,
extraction, and creating new features which are other interesting topics to be covered in upcoming
blogs.

4. Nonrepresentative training data:

To make sure that our model generalizes well, we have to make sure that our training data should be
representative of the new cases that we want to generalize to.

If train our model by using a nonrepresentative training set, it won’t be accurate in predictions it will
be biased against one class or a group.

For E.G., Let us say you are trying to build a model that recognizes the genre of music. One way to
build your training set is to search it on youtube and use the resulting data. Here we assume that
youtube’s search engine is providing representative data but in reality, the search will be biased
towards popular artists and maybe even the artists that are popular in your location(if you live in
India you will be getting the music of Arijit Singh, Sonu Nigam or etc).

So use representative data during training, so your model won’t be biased among one or two classes
when it works on testing data.

24
5. Overfitting and Underfitting :

Let’s start with an example, say one day you are walking down a street to buy something, a dog
comes out of nowhere you offer him something to eat but instead of eating he starts barking and
chasing you but somehow you are safe. After this particular incident, you might think all dogs are not
worth treating nicely.

So this overgeneralization is what we humans do most of the time, and unfortunately machine
learning model also does the same if not paid attention. In machine learning, we call this overfitting
i.e model performs well on training data but fails to generalize well.

Overfitting happens when our model is too complex.

Things which we can do to overcome this problem:

1. Simplify the model by selecting one with fewer parameters.


2. By reducing the number of attributes in training data.
3. Constraining the model.
4. Gather more training data.

7. Differentiate traditional and machine learning approaches with neat sketches. [7M] July
– 2023 Set -1 [Remember]

Traditional Programming vs Machine Learning

Traditional computer programming has been around for more than a century, with the first known
computer program dating back to the mid 1800s. Traditional Programming refers to any manually
created program that uses input data and runs on a computer to produce the output.

But for decades now, an advanced type of programming has revolutionized business, particularly in
the areas of intelligence and embedded analytics. In Machine Learning programming, also known as
augmented analytics, the input data and output are fed to an algorithm to create a program. This
yields powerful insights that can be used to predict future outcomes.

Difference Between Traditional Programming and Machine Learning

The difference between traditional programming and machine learning lies in their approaches to
problem-solving and how they are programmed to handle tasks:

1. Approach to Problem Solving:


o Traditional Programming: In traditional programming, a programmer writes explicit
rules or instructions for the computer to follow. These rules dictate exactly how the
computer should process input data to produce the desired output. It requires a deep
25
understanding of the problem and a clear way to encode the solution in a
programming language.
o Machine Learning: In machine learning, instead of writing explicit rules, a
programmer trains a model using a large dataset. The model learns patterns and
relationships from the data, enabling it to make predictions or decisions without being
explicitly programmed for each possibility. This approach is particularly useful for
complex problems where defining explicit rules is difficult or impossible.
2. Data Dependency:
o Traditional Programming: Relies less on data. The quality of the output depends
mainly on the logic defined by the programmer.
o Machine Learning: Heavily reliant on data. The quality and quantity of the training
data significantly impact the performance and accuracy of the model.
3. Flexibility and Adaptability:
o Traditional Programming: Has limited flexibility. Changes in the problem domain
require manual updates to the code.
o Machine Learning: Offers higher adaptability to new scenarios, especially if the
model is retrained with updated data.
4. Problem Complexity:
o Traditional Programming: Best suited for problems with clear, deterministic logic.
o Machine Learning: Better for dealing with complex problems where patterns and
relationships are not evident, such as image recognition, natural language processing,
or predictive analytics.
5. Development Process:
o Traditional Programming: The development process is generally linear and
predictable, focusing on implementing and debugging predefined logic.
o Machine Learning: Involves an iterative process where models are trained, evaluated,
and fine-tuned. This process can be less predictable and more experimental.
6. Outcome Predictability:
o Traditional Programming: The outcome is highly predictable if the inputs and the
logic are known.
o Machine Learning: Predictions or decisions made by a machine learning model can
sometimes be less interpretable, especially with complex models like deep neural
networks.

Here’s a closer comparison of traditional programming versus machine learning:

Traditional Programming

Traditional programming is a manual process—meaning a person (programmer) creates the program.


But without anyone programming the logic, one has to manually formulate or code rules.

In machine learning, on the other hand, the algorithm automatically formulates the rules from the
data.
26
Machine Learning Programming

Unlike traditional programming, machine learning is an automated process. It can increase the value
of your embedded analytics in many areas, including data prep, natural language interfaces,
automatic outlier detection, recommendations, and causality and significance detection. All of these
features help speed user insights and reduce decision bias.

For example, if you feed in customer demographics and transactions as input data and use historical
customer churn rates as your output data, the algorithm will formulate a program that can predict if a
customer will churn or not. That program is called a predictive model.

You can use this model to predict business outcomes in any situation where you have input and
historical output data:

Identify the business question you would like to ask.

Identify the historical input.

Identify the historically observed output (i.e., data samples for when the condition is true and for
when it’s false).

For instance, if you want to predict who will pay the bills late, identify the input (customer
demographics, bills) and the output (pay late or not), and let the machine learning use this data to
create your model.

27
In summary, traditional programming is rule-based and deterministic, relying on human-crafted logic,
whereas machine learning is data-driven and probabilistic, relying on patterns learned from data.

As you can see, machine learning can turn your business data into a financial asset. You can point the
algorithm at your data so it can learn powerful rules that can be used to predict future outcomes. It’s
no wonder predictive analytics is now the number one capability on product roadmaps.

Statistical Learning: Supervised and Unsupervised Learning

8. What is the importance of Probability and Statistics while generating supervised or

unsupervised model? Explain. [7M] July – 2023 Set -2 [Remember]

Probability and statistics are crucial in the field of machine learning when generating supervised or
unsupervised models. They play a fundamental role in various aspects of model development,
evaluation, and interpretation. Here's an explanation of their importance:

1. Data Understanding and Preprocessing:

- Probability and statistics help in understanding the characteristics of the data you are working with.
Descriptive statistics, such as mean, median, standard deviation, and histograms, provide insights
into data distribution and central tendencies.

- Probability distributions help modelers make assumptions about the underlying data generation
process, which can guide preprocessing decisions. For example, normalizing or transforming data to
meet certain assumptions, dealing with outliers, and handling missing values are common data
preprocessing tasks based on statistical analysis.

2. Feature Selection and Engineering:

- Probability and statistics aid in selecting relevant features for model training. Techniques like
correlation analysis and feature importance scores can help identify which features have the most
significant impact on the target variable.

- Feature engineering often involves creating new features based on statistical insights, such as
creating interaction terms, aggregating data, or generating statistical summaries of time-series data.

3. Model Training and Validation:

- Probability theory underlies many machine learning algorithms. For example, in linear regression,
the model assumes that the relationship between the features and target variable follows a
probabilistic distribution.

- In supervised learning, statistics help in splitting the dataset into training, validation, and test sets.
Cross-validation techniques use statistical principles to assess a model's generalization performance.

- Probability-based metrics like log-likelihood and loss functions are used to quantify the model's
performance, making it easier to compare different models.

4. Model Selection and Hyperparameter Tuning:

28
- Statistical methods, such as hypothesis testing and A/B testing, can be used to compare different
models or algorithms to determine which one performs better on a given dataset.

- Hyperparameter tuning often involves conducting experiments and using statistical techniques to
optimize model hyperparameters, such as grid search, random search, or Bayesian optimization.

5. Uncertainty Estimation:

- Probability and statistics play a crucial role in estimating and quantifying uncertainty in predictions.
In many cases, models not only provide point predictions but also confidence intervals or probability
distributions, allowing users to assess the reliability of predictions.

6. Interpretability and Explainability:

- Statistical methods can be used to explain the importance of each feature in a model's predictions.
Techniques like feature importance scores and permutation importance help interpret the model's
behavior.

- For unsupervised learning, statistical analysis can uncover hidden patterns or clusters in data, aiding
in meaningful insights and interpretation.

In summary, probability and statistics are the foundation of machine learning and are essential for
every stage of model development. They help in data preprocessing, feature selection and
engineering, model training and evaluation, model selection, and interpretation of results, ultimately
leading to better model performance and more meaningful insights.

9. Would you frame the problem of spam detection as a supervised learning problem or an

unsupervised learning problem? Explain. [7M] July – 2023 Set -3[Understand]

Spam detection is typically framed as a supervised learning problem rather than an


unsupervised learning problem. Here's an explanation of why:

1. Labeled Data Available: In a supervised learning problem, you have access to a labeled dataset
where each email or message is explicitly marked as "spam" or "not spam" (ham). This labeled data
is essential for training a machine learning model. In contrast, unsupervised learning methods do not
rely on labeled data for training; they discover patterns and structure in unlabeled data.

2. Binary Classification: Spam detection is a binary classification task where the goal is to
categorize incoming emails or messages into one of two classes: "spam" or "not spam." Supervised
learning is well-suited for binary classification tasks as it learns a mapping from input features (e.g.,
email content, sender information) to the target labels (spam or not spam).

3. Performance Evaluation: In supervised learning, you can evaluate the model's performance
using metrics like accuracy, precision, recall, F1-score, and ROC AUC. These metrics require ground
29
truth labels, which are available in spam detection. Unsupervised learning, on the other hand, lacks
clear labels for performance evaluation.

4. Iterative Improvement: Supervised learning models can be iteratively improved by training on


new labeled data or adjusting the model's parameters. This iterative process is not feasible in
unsupervised learning, where the algorithm explores data structure without explicit target labels.

5. Clear Objective: In supervised learning, the objective is well-defined: to build a model that can
accurately classify incoming messages as spam or not spam. Unsupervised learning may not have
such a clear and specific objective, making it less suitable for this problem.

While supervised learning is the standard approach for spam detection, it's worth noting that
unsupervised learning methods can complement supervised approaches. For instance, unsupervised
techniques like clustering can be used to identify potentially spammy patterns or groups within the
data, which can then be further investigated and labeled by human experts. However, the core task of
classifying individual messages as spam or not spam is best addressed with supervised learning due
to the availability of labeled data and the clear binary classification objective.

STATISTICAL LEARNING:

INTRODUCTION:

An Introduction to Statistical Learning provides a broad and less technical treatment of key topics in
statistical learning. Each chapter includes an R lab. This book is appropriate for anyone who wishes
to use contemporary tools for data analysis.

SUPERVISED LEARNING:

As its name suggests, Supervised machine learning

is based on supervision. It means in the supervised learning technique, we train the machines using
the "labelled" dataset, and based on the training, the machine predicts the output. Here, the labelled
data specifies that some of the inputs are already mapped to the output. More preciously, we can say;
first, we train the machine with the input and corresponding output, and then we ask the machine to
predict the output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an input dataset of cats and
dog images. So, first, we will provide the training to the machine to understand the images, such as
the shape & size of the tail of cat and dog, Shape of eyes, colour, height (dogs are taller, cats are
smaller), etc. After completion of training, we input the picture of a cat and ask the machine to
identify the object and predict the output. Now, the machine is well trained, so it will check all the
features of the object, such as height, shape, colour, eyes, ears, tail, etc., and find that it's a cat. So, it
will put it in the Cat category. This is the process of how the machine identifies the objects in
Supervised Learning.

The main goal of the supervised learning technique is to map the input variable(x) with the
output variable(y). Some real-world applications of supervised learning are Risk Assessment,
Fraud Detection, Spam filtering, etc.

30
Categories of Supervised Machine Learning

Supervised machine learning can be classified into two types of problems, which are given below:

o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the output variable is
categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification algorithms
predict the categories present in the dataset. Some real-world examples of classification algorithms
are Spam Detection, Email filtering, etc.

Some popular classification algorithms are given below:

o Random Forest Algorithm


o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm

b) Regression

Regression algorithms are used to solve regression problems in which there is a linear relationship
between input and output variables. These are used to predict continuous output variables, such as
market trends, weather prediction, etc.

Some popular Regression algorithms are given below:

o Simple Linear Regression Algorithm


o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages and Disadvantages of Supervised Learning

Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact idea about
the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

o These algorithms are not able to solve complex tasks.


31
o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning

Some common applications of Supervised Learning are given below:

o ImageSegmentation:
Supervised Learning algorithms are used in image segmentation. In this process, image
classification is performed on different image data with pre-defined labels.
o MedicalDiagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes. It is done by
using medical images and past labelled data with labels for disease conditions. With such a
process, the machine can identify a disease for the new patients.
o Fraud Detection - Supervised Learning classification algorithms are used for identifying
fraud transactions, fraud customers, etc. It is done by using historic data to identify the
patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are used. These
algorithms classify an email as spam or not spam. The spam emails are sent to the spam
folder.
o Speech Recognition - Supervised learning algorithms are also used in speech recognition.
The algorithm is trained with voice data, and various identifications can be done using the
same, such as voice-activated passwords, voice commands, etc

UNSUPERVISED LEARNING:

Unsupervised learning

Is different from the Supervised learning technique; as its name suggests, there is no need for
supervision. It means, in unsupervised machine learning, the machine is trained using the unlabeled
dataset, and the machine predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor labelled,
and the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the unsorted
dataset according to the similarities, patterns, and differences. Machines are instructed to find the
hidden patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of fruit images, and
we input it into the machine learning model. The images are totally unknown to the model, and the
task of the machine is to find the patterns and categories of the objects.

32
So, now the machine will discover its patterns and differences, such as colour difference, shape
difference, and predict the output when it is tested with the test dataset.

Categories of Unsupervised Machine Learning

Unsupervised Learning can be further classified into two types, which are given below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the data. It is a way
to group the objects into a cluster such that the objects with the most similarities remain in one group
and have fewer or no similarities with the objects of other groups. An example of the clustering
algorithm is grouping the customers by their purchasing behaviour.

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm


o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting relations
among variables within a large dataset. The main aim of this learning algorithm is to find the
dependency of one data item on another data item and map those variables accordingly so that it can
generate maximum profit. This algorithm is mainly applied in Market Basket analysis, Web usage
mining, continuous production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-growth
algorithm.

Advantages and Disadvantages of Unsupervised Learning Algorithm

Advantages:

o These algorithms can be used for complicated tasks compared to the supervised ones because
these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is
easier as compared to the labelled dataset.

Disadvantages:
33
o The output of an unsupervised algorithm can be less accurate as the dataset is not labelled,
and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled dataset
that does not map with the output.

Applications of Unsupervised Learning


o Network Analysis: Unsupervised learning is used for identifying plagiarism and copyright in
document network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised learning
techniques for building recommendation applications for different web applications and e-
commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised learning,
which can identify unusual data points within the dataset. It is used to discover fraudulent
transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract
particular information from the database. For example, extracting information of each user
located at a particular location.

10. Explain Training and Test Loss while generating the models. [7M] July – 2023 Set
- 1[Understand]

TRAINING AND TESTING:

What is Training Dataset?

The training data is the biggest (in -size) subset of the original dataset, which is used to train or fit
the machine learning model. Firstly, the training data is fed to the ML algorithms, which lets them
learn how to make predictions for the given task.

For example, for training a sentiment analysis model, the training data could be as below:

187.7K
All the NEW Features & Changes in iOS 15 Beta 3: Safari Tweaks, Apple Music Widget, & More!

Input Output (Labels)

The New UI is Great Positive

Update is really Slow Negative

The training data varies depending on whether we are using Supervised Learning or Unsupervised
Learning Algorithms.
34
For Unsupervised learning, the training data contains unlabeled data points, i.e., inputs are not
tagged with the corresponding outputs. Models are required to find the patterns from the given
training datasets in order to make predictions.

On the other hand, for supervised learning, the training data contains labels in order to train the
model and make predictions.

The type of training data that we provide to the model is highly responsible for the model's accuracy
and prediction ability. It means that the better the quality of the training data, the better will be the
performance of the model. Training data is approximately more than or equal to 60% of the total data
for an ML project.

What is Test Dataset

Once we train the model with the training dataset, it's time to test the model with the test dataset.
This dataset evaluates the performance of the model and ensures that the model can generalize well
with the new or unseen dataset. The test dataset is another subset of original data, which is
independent of the training dataset. However, it has some similar types of features and class
probability distribution and uses it as a benchmark for model evaluation once the model training is
completed. Test data is a well-organized dataset that contains data for each type of scenario for a
given problem that the model would be facing when used in the real world. Usually, the test dataset
is approximately 20-25% of the total original data for an ML project.

Machine Learning algorithms enable the machines to make predictions and solve problems on the
basis of past observations or experiences. These experiences or observations an algorithm can take
from the training data, which is fed to it. Further, one of the great things about ML algorithms is that
they can learn and improve over time on their own, as they are trained with the relevant training data.

Once the model is trained enough with the relevant training data, it is tested with the test data. We
can understand the whole process of training and testing in three steps, which are as follows:

1. Feed: Firstly, we need to train the model by feeding it with training input data.
2. Define: Now, training data is tagged with the corresponding outputs (in Supervised Learning),
and the model transforms the training data into text vectors or a number of data features.
3. Test: In the last step, we test the model by feeding it with the test data/unseen dataset. This
step ensures that the model is trained efficiently and can generalize well.

The above process is explained using a flowchart given below:

35
Training and test loss are essential concepts in the context of machine learning model development,
particularly in supervised learning. They are used to assess how well a model is performing during
training and how it is expected to generalize to unseen data. Here's an explanation of training and test
loss:

1. Training Loss:

- Training loss, also known as training error or training cost, is a measure of how well a machine
learning model fits the training data during the training process.

- It is calculated by applying the model to the training dataset and comparing its predictions to the
actual target values (ground truth).

- The loss function, also called the cost function or objective function, quantifies the discrepancy
between the model's predictions and the actual labels.

- The goal during training is to minimize this loss. The training algorithm adjusts the model's
parameters (e.g., weights and biases in a neural network) to find the parameter values that minimize
the loss.

2. Test Loss (Validation Loss or Evaluation Loss):

- Test loss, also known as validation loss or evaluation loss, is a measure of how well a machine
learning model generalizes to new, unseen data.

- It is calculated by applying the trained model to a separate dataset called the validation or test set,
which was not used during training.

- Like the training loss, the test loss is also computed using the same loss function and is used to
compare the model's predictions with the true labels.
36
- The test loss helps assess how well the model has learned to generalize from the training data to
make predictions on data it has never seen before.

Here's why training and test loss are important:

- Overfitting Detection: Monitoring the training and test loss allows you to detect overfitting.
Overfitting occurs when a model fits the training data extremely well but fails to generalize to new
data. In such cases, the training loss decreases, but the test loss increases, indicating that the model is
memorizing the training data rather than learning the underlying patterns.

- Model Selection: You can use test loss to compare different models or hyperparameter settings.
The model or configuration with the lowest test loss is often selected as the best choice because it is
expected to generalize well to new data.

- Early Stopping: Training loss and test loss can be used for early stopping. If the test loss starts to
increase (indicating overfitting) while the training loss continues to decrease, you can stop training to
prevent overfitting and save the best-performing model.

- Model Fine-Tuning: Test loss is useful for fine-tuning a model's hyperparameters (e.g., learning
rate, batch size). By monitoring test loss, you can adjust these hyperparameters to achieve better
generalization.

In summary, training loss measures how well a model fits the training data, while test loss assesses
how well it generalizes to new, unseen data. Balancing these two metrics is crucial to building a
machine learning model that performs well on real-world tasks and avoids overfitting.

11. Define and explain Optimal prediction function for Squared Error Loss. [7M] July –
2023 Set -3[Remember]

In the context of machine learning and regression problems, the optimal prediction function for
squared error loss is often referred to as the "Least Squares Estimator" or "Ordinary Least Squares
(OLS)" regression. This method aims to find the prediction function that minimizes the sum of
squared differences between the predicted values and the actual target values. Let's define and
explain the optimal prediction function for squared error loss:

Objective: Given a dataset with input features (independent variables) denoted as X and
corresponding target values (dependent variable) denoted as y, the goal is to find a prediction
function f(X) that minimizes the squared error loss.

Mathematical Formulation:

The prediction function f(X) is typically represented as a linear combination of the input features
with associated coefficients (weights):

f(X) = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ

Here:

- f(X) is the predicted target value.


37
- X₁, X₂, ..., Xₖ are the input features.

- β₀, β₁, β₂, ..., βₖ are the coefficients to be determined.

The squared error loss (also known as the residual sum of squares, RSS) for a single data point (xi, yi)
is defined as:

L(yi, f(xi)) = (yi - f(xi))

The overall squared error loss for the entire dataset is given by the sum of squared errors over all data
points:

L(β₀, β₁, β₂, ..., βₖ) = Σᵢ(yi - f(xi))² for i = 1 to n (where n is the number of data points)

Optimization Objective:

The objective is to find the values of the coefficients (β₀, β₁, β₂, ..., βₖ) that minimize the total squared
error loss:

minimize L(β₀, β₁, β₂, ..., βₖ)

Solution:

The solution to the optimization problem involves finding the values of β₀, β₁, β₂, ..., βₖ that minimize
the loss function L(β₀, β₁, β₂, ..., βₖ). This can be achieved using various optimization techniques,
with the most common method being the "Least Squares" approach. In this approach, the coefficients
are estimated using the following formula:

β = (XᵀX)⁻¹Xᵀy

Where:

- β is the vector of coefficients (including β₀).

- X is the matrix of input features (each row corresponds to a data point, and each column
corresponds to a feature).

- y is the vector of target values.

Once you have estimated the coefficients using the least squares method, you can use the resulting
prediction function f(X) to make predictions on new data points.

The optimal prediction function for squared error loss, obtained through OLS regression, minimizes
the sum of squared differences between the predicted and actual target values and is a widely used
method in linear regression. It provides a linear model that best fits the training data in terms of
minimizing squared errors.

Tradeoffs in Statistical Learning

12. Explain Tradeoffs in Statistical Learning. [7M] July – 2023 Set -2[Understand]

38
TRADEOFFS IN STATISTICAL LEARNING:

It is important to understand prediction errors (bias and variance) when it comes to accuracy in any
machine learning algorithm. There is a tradeoff between a model’s ability to minimize bias and
variance which is referred to as the best solution for selecting a value of Regularization constant.
Proper understanding of these errors would help to avoid the overfitting and underfitting of a data
set while training the algorithm.
Bias
The bias is known as the difference between the prediction of the values by the ML model and the
correct value. Being high in biasing gives a large error in training as well as testing data. Its
recommended that an algorithm should always be low biased to avoid the problem of underfitting.
By high bias, the data predicted is in a straight line format, thus not fitting accurately in the data in
the data set. Such fitting is known as Underfitting of Data. This happens when the hypothesis is
too simple or linear in nature. Refer to the graph given below for an example of such a situation.

High Bias

In such a problem, a hypothesis looks like

follows.
Variance
The variability of model prediction for a given data point which tells us spread of our data is called
the variance of the model. The model with high variance has a very complex fit to the training data
and thus is not able to fit accurately on the data which it hasn’t seen before. As a result, such
models perform very well on training data but has high error rates on test data.
When a model is high on variance, it is then said to as Overfitting of Data. Overfitting is fitting
the training set accurately via complex curve and high order hypothesis but is not the solution as
the error with unseen data is high.
While training a data model variance should be kept low.
The high variance data looks like follows.

High Variance
39
In such a problem, a hypothesis looks like follows.

Bias Variance Tradeoff


If the algorithm is too simple (hypothesis with linear eq.) then it may be on high bias and low
variance condition and thus is error-prone. If algorithms fit too complex ( hypothesis with high
degree eq.) then it may be on high variance and low bias. In the latter condition, the new entries
will not perform well. Well, there is something between both of these conditions, known as Trade-
off or Bias Variance Trade-off.
This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm
can’t be more complex and less complex at the same time. For the graph, the perfect tradeoff will
be like.

The best fit will be given by hypothesis on the tradeoff point.


The error to complexity graph to show trade-off is given as –

This is referred to as the best point chosen for the training of the algorithm which gives low error
in training as well as testing data.

Estimating Risk Statistics


13. List and explain Risk statistics. [7M] July – 2023 Set -1[Remember]

ESTIMATING RISK STATISTICS:

40
Unraveling the genetic background of human diseases serves a number of goals. One aim is to
identify genes that modify the susceptibility to disease. In this context, we ask questions like: “Is this
genetic variant more frequent in patients with the disease of interest than in unaffected controls?” or
“Is the mean phenotype higher in carriers of this genetic variant than in non-carriers?” From the
answers, we possibly learn about the pathogenesis of the disease, and we can identify possible targets
for therapeutic interventions. Looking back at the past decade, it can be summarized that genome-
wide association (GWA) studies have been useful in this endeavor (Hindorff et al. 2012).

When we consider classical measures for strength of association on the one hand, such as the odds
ratio (OR), and for classification on the other hand, such as sensitivity (sens) and specificity (spec),
there is a simple relationship between them with

The overall process of rule construction and evaluation is shown in Fig. 1.

Fig. 1
Path to construct, evaluate and validate a rule of classification or probability estimation

Risk statistics in machine learning are crucial metrics that help in understanding, measuring, and
managing the uncertainty and potential errors in predictive models. Here are some key risk
statistics often used in machine learning:

Mean Squared Error (MSE): This measures the average of the squares of the errors—that is, the
average squared difference between the estimated values and the actual value. It's a measure of the
quality of an estimator; lower values indicate better accuracy.

41
Root Mean Squared Error (RMSE): RMSE is the square root of the mean squared error. It
measures the standard deviation of the residuals (prediction errors). Residuals are a measure of
how far from the regression line data points are.

Mean Absolute Error (MAE): This is the mean of the absolute values of the errors. It measures
how close forecasts or predictions are to the eventual outcomes. Unlike the MSE, the MAE does
not square the errors, making it less sensitive to outliers.

R-squared (R²): This statistic measures the proportion of the variance in the dependent variable
that is predictable from the independent variable(s). It's a statistical measure of how close the data
are to the fitted regression line. R² always takes on a value between 0 and 1, where 1 indicates that
the model explains all the variability of the response data around its mean.

Adjusted R-squared: This adjusts the R² statistic based on the number of predictors in the model.
It accounts for the phenomenon where R² tends to artificially increase as more predictors are added
to the model, regardless of their significance.

Confusion Matrix: In classification problems, a confusion matrix is a table used to describe the
performance of a classification model. It presents the actual values in comparison to the model's
predictions.

Precision, Recall, and F1 Score: Precision is the ratio of correctly predicted positive observations
to the total predicted positives. Recall (Sensitivity) is the ratio of correctly predicted positive
observations to all observations in actual class. The F1 Score is the weighted average of Precision
and Recall, used when the class distribution is uneven.

Area Under the ROC Curve (AUC-ROC): This statistic is used in binary classification to
measure the model's performance. The ROC curve is a plot of true positive rate against false
positive rate at various thresholds and AUC represents the degree or measure of separability
achieved by the model.

Logarithmic Loss (Log Loss): This measures the performance of a classification model where the
prediction is a probability value between 0 and 1. Log loss increases as the predicted probability
diverge from the actual label.

Each of these statistics provides a different perspective on the model's performance and potential
areas of improvement, helping data scientists and machine learning engineers to refine their models
effectively.

Sampling distribution of an estimator

42
14. Write about Sampling distribution of an estimator. [7M] July – 2023 Set -4[Apply]

SAMPLING DISTRIBUTION OF AN ESTIMATOR:


One of the most important concepts discussed in the context of inferential data analysis is the idea of
sampling distributions. Understanding sampling distributions helps us better comprehend and interpret
results from our descriptive as well as predictive data analysis investigations. Sampling distributions
are also frequently used in decision making under uncertainty and hypothesis testing.
What are sampling distributions?

You may already be familiar with the idea of probability distributions. A probability distribution gives
us an understanding of the probability and likelihood associated with values (or range of values) that a
random variable may assume. A random variable is a quantity whose value (outcome) is determined
randomly. Some examples of a random variable include, the monthly revenue of a retail store, the
number of customers arriving at a car wash location on any given day, the number of accidents on a
certain highway on any given day, weekly sales volume at a retail store, etc.

Sampling distribution of the sample mean

Assuming that X represents the data (population), if X has a distribution with average μ and standard
deviation σ, and if X is approximately normally distributed or if the sample size n is large,

The above distribution is only valid if,

X is approximately normal or sample size n is large, and,

 the data (population) standard deviation σ is known.

If X is normal, then X̅ is also normally distributed regardless of the sample size n. Central Limit
Theorem tells us that even if X is not normal, if the sample size is large enough (usually greater than
30), then X̅’s distribution is approximately normal (Sharpe, De Veaux, Velleman and Wright, 2020,
pp. 318–320). If X̅ is normal, we can easily standardize and convert it to the standard normal
distribution Z.

43
If the population standard deviation σ is not known, we cannot assume that the sample mean X̅ is
normally distributed. If certain conditions are satisfied (explained below), then we can transform X̅ to
another random variable t such that,

The random variable t is said to follow the t-distribution with n-1 degrees of freedom, where n is the
sample size. The t-distribution is bell-shaped and symmetric (just like the normal distribution) but has
fatter tails compared to the normal distribution. This means values further away from the mean have a
higher likelihood of occurring compared to that in the normal distribution.

The conditions to use the t-distribution for the random variable t are as follows (Sharpe et al., 2020, pp.
415–420):

If X is normally distributed, even for small sample sizes (n<15), the t-distribution can be used.

If the sample size is between 15 and 40, the t-distribution can be used as long as X is unimodal and
reasonably symmetric.

For sample sizes greater than 40, the t-distribution can be used unless X’s distribution is heavily
skewed.

Empirical Risk Minimization

15. What is Empirical Risk Minimization? Explain Estimating the risk using cross
validation. [7M] July – 2023 Set -3[Remember]

Empirical risk minimization (ERM):

Empirical risk minimization (ERM): It is a principle in statistical learning theory which defines a
family of learning algorithms and is used to give theoretical bounds on their performance.

The idea is that we don’t know exactly how well an algorithm will work in practice (the true "risk")
because we don't know the true distribution of data that the algorithm will work on but as an
alternative we can measure its performance on a known set of training data.

We assumed that our samples come from this distribution and use our dataset as an approximation.
If we compute the loss using the data points in our dataset, it’s called empirical risk.

It is “empirical “and not “true” because we are using a dataset that’s a subset of the whole population.

When our learning model is built, we have to pick a function that minimizes the empirical risk that is
the delta between predicted output and actual output for data points in the dataset.

44
This process of finding this function is called empirical risk minimization (ERM). We want to
minimize the true risk.
We don’t have information that allows us to achieve that, so we hope that this empirical risk will
almost be the same as the true empirical risk.

In the equation below, we can define the true error, which is based on the whole domain X:

Since we only have access to S, a subset of the input domain, we learn based on that sample of
training examples. We don’t have access to the true error, but to the empirical error:

Let’s get a better understanding by Example.

We would want to build a model that can differentiate between a male and a female based on specific
features.

If we select 150 random people where women are really short, and men are really tall, then the model
might incorrectly assume that height is the differentiating feature.

For building a truly accurate model, we have to gather all the women and men in the world to extract
differentiating features.

Unfortunately, that is not possible! So we select a small number of people and hope that this sample
is representative of the whole population.

16. What is Empirical Risk Minimization? Explain Regularized and Structural risk
minimizations? [7M] July – 2023 Set -4[Remember]

Empirical Risk Minimization (ERM) is a fundamental concept in the field of statistical learning
theory and machine learning. It focuses on minimizing the risk over a training dataset to develop
predictive models. Here's a detailed explanation:

Empirical Risk Minimization (ERM)


Concept: ERM involves selecting a model or hypothesis from a given class of functions that
minimizes the empirical risk, which is the average loss over the training data. This approach is

45
based on the idea that minimizing the error on the training data will lead to a model that performs
well on unseen data.

Calculation: Empirical risk is typically calculated as the average loss over the training samples. For
instance, in a regression problem, this might be the mean squared error between the model's
predictions and the actual values.

Limitation: A key limitation of ERM is overfitting, where the model performs well on training data
but poorly on unseen data. This happens because the model becomes too complex and starts
capturing noise in the training data as patterns.

Regularized Risk Minimization


To address the limitations of ERM, Regularized Risk Minimization (RRM) is often used.

Concept: RRM involves adding a regularization term to the empirical risk. This term penalizes
overly complex models, thus helping to prevent overfitting. The objective is to find a balance
between fitting the training data well (minimizing empirical risk) and keeping the model simple
(through regularization).

Types of Regularization:

L1 Regularization (Lasso): Adds the sum of the absolute values of the coefficients as the penalty
term. It can lead to sparse models where some coefficients can become zero.
L2 Regularization (Ridge): Adds the sum of the squares of the coefficients as the penalty term.
This tends to shrink the coefficients but doesn't set them to zero.
Structural Risk Minimization (SRM)
Structural Risk Minimization (SRM) is another strategy developed to overcome the limitations of
ERM.

Concept: SRM is based on the idea of balancing the complexity of the model with its performance
on the training data. It involves choosing a model from a series of model classes with increasing
complexity.

Approach: In SRM, the risk is composed of two main parts: the empirical risk and a term that
grows with the complexity of the model class. As the model complexity increases, the empirical
risk might decrease, but the penalty for complexity increases. The goal is to find the model in the
series that minimizes this overall risk.

46
Advantage: SRM provides a more systematic approach than simple regularization. It considers the
capacity of the model class (i.e., its ability to fit a variety of functions) when minimizing risk,
leading to potentially better generalization on unseen data.

In summary, while ERM focuses on minimizing the error on training data, RRM and SRM
introduce additional components to control model complexity, thereby aiming to improve the
model's performance on unseen data and prevent overfitting. RRM does this through a penalty term
in the loss function, and SRM does it by selecting from a series of model classes of increasing
complexity.

47

You might also like