0% found this document useful (0 votes)
163 views34 pages

ML Unit-1

The document provides an overview of machine learning, including definitions of machine learning, artificial intelligence, and deep learning. It discusses why machine learning is used and some common applications like image recognition, speech recognition, product recommendations, self-driving cars, email filtering, and more.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views34 pages

ML Unit-1

The document provides an overview of machine learning, including definitions of machine learning, artificial intelligence, and deep learning. It discusses why machine learning is used and some common applications like image recognition, speech recognition, product recommendations, self-driving cars, email filtering, and more.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Unit-1

Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of


Machine Learning Systems, Main Challenges of Machine Learning.
Statistical Learning: Introduction, Supervised and Unsupervised Learning,
Training and Test Loss, Tradeoffs in Statistical Learning, Estimating Risk
Statistics, Sampling distribution of an estimator, Empirical Risk Minimization.

What Is Machine Learning?


Machine Learning is the science (and art) of programming computers so
they can learn from data.
General definition:
[Machine Learning is the] field of study that gives computers the ability to learn
without being explicitly programmed.
—Arthur Samuel, 1959
engineering-oriented one:
A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P,
improves with experience E.
—Tom Mitchell, 1997

Introduction-
• Machine Learning is a branch of artificial intelligence (AI) that focuses on
creating systems capable of learning and making decisions without being
explicitly programmed.
• The idea is to enable computers to learn from experience, analyze data, and
improve their performance over time.
• Machine Learning is like teaching computers to learn from examples.
Instead of giving them strict instructions, we show them lots of examples,
and they figure out patterns by themselves.
• The core idea is to allow machines to automatically improve and adapt their
performance based on experience, uncovering patterns, and making
intelligent choices in diverse applications.

Machine Learning is defined as the study of computer algorithms for


automatically constructing computer software through past experience and
training data.

AI vs MLvsDL

Artificial Intelligence (AI):

• Definition: AI involves creating systems or machines that can perform tasks


that typically require human intelligence.
• Scope: Encompasses a wide range of applications, including problem-
solving, speech recognition, learning, planning, perception, and natural
language understanding.
• Examples: Virtual assistants (like Siri or Alexa), game-playing algorithms,
and autonomous vehicles.

Machine Learning (ML):

• Definition: ML is a subset of AI that focuses on developing algorithms and


models allowing computers to learn from data and improve their
performance on specific tasks.
• Approach: ML algorithms enable machines to identify patterns, make
predictions, or optimize their behavior based on data without being explicitly
programmed.
• Types: Includes supervised learning (learning from labeled data),
unsupervised learning (finding patterns in unlabeled data), and
reinforcement learning (learning through trial and error).
• Examples: Spam filters, recommendation systems (like those used by
Netflix or Amazon), and image recognition.

Deep Learning (DL):

• Definition: DL is a specialized form of ML that involves neural networks


with multiple layers (deep neural networks).
• Architecture: DL models consist of interconnected layers of nodes
(neurons) that can automatically learn hierarchical representations of data.
• Applications: Particularly successful in complex tasks such as image and
speech recognition, natural language processing, and autonomous vehicles.
• Example: Deep neural networks used in facial recognition technology.

In summary, AI is the broader concept of creating intelligent machines, ML is a


subset focused on learning from data, and DL is an advanced form of ML that
specifically uses deep neural networks for sophisticated tasks. AI encompasses
both ML and DL, and they work together to enable machines to perform tasks that
traditionally required human intelligence.
Why Use Machine Learning?

Machine Learning (ML) is a subset of artificial intelligence (AI) that focuses on the
development of algorithms and statistical models that enable computers to perform
tasks without explicit programming. ML is widely used across various industries for
its ability to learn from data, recognize patterns, and make predictions or decisions.
Using Traditional Programming Techni‐ques Vs Machine Learning

for example to perform spam filteration

1.First you would consider what spam typically looks like. You might notice that
some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”) tend to
come up a lot in the subject line

2. You would write a detection algorithm for each of the patterns that you noticed,
and your program would flag emails as spam if a number of these patterns were
detected.

3. You would test your program and repeat steps 1 and 2 until it was good enough
to launch.

Fig: Traditional approach

In contrast, a spam filter based on Machine Learning techniques automatically learns


which words and phrases are good predictors of spam by detecting unusually
frequent patterns of words in the spam examples. The program is much shorter,
easier to maintain, and most likely more accurate
Fig: Machine Learning approach

Another area where Machine Learning shines is for problems that either are too
complex for traditional approaches or have no known algorithm.

For example in speech recognition algorithm to differentiate between the words


“one” and “Two” Instead of using fixed rules teach the computer by giving it many
examples of people saying these words. This way, the computer can learn and adapt
to different voices and situations. Modern systems often use advanced techniques
like deep learning to make this learning process more

1. Machine Learning assists human understanding by allowing inspection of


learned patterns.
2. For instance, a trained spam filter can be examined to reveal the key words
predicting spam.
3. ML techniques, including data mining, uncover hidden patterns in large
datasets, aiding in problem understanding and trend discovery.

Applications of Machine learning

1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It


is used to identify objects, persons, places, digital images, etc. The popular use case
of image recognition and face detection is, Automatic friend tagging suggestion:

It is based on the Facebook project named "Deep Face," which is responsible for
face recognition and person identification in the picture.
2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is


also known as "Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are widely used by various applications of speech
recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the
correct path with the shortest route and predicts the traffic conditions.It predicts the
traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

4.Product recommendations:

Machine learning is widely used by various e-commerce and entertainment


companies such as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an
advertisement for the same product while internet surfing on the same browser and
this is because of machine learning

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars.


Machine learning plays a significant role in self-driving cars. Tesla, the most popular
car manufacturing company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and objects while driving.

6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as important, normal,


and spam. We always receive an important mail in our inbox with the important
symbol and spam emails in our spam box, and the technology behind this is Machine
learning. Below are some spam filters used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision


tree, and Naïve Bayes classifier are used for email spam filtering and malware
detection.
7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google


assistant, Alexa, Cortana, Siri. As the name suggests, they help us in finding the
information using our voice instruction. These assistants can help us in various ways
just by our voice instructions such as Play music, call someone, Open an email,
Scheduling an appointment, etc.

8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various
ways that a fraudulent transaction can take place such as fake accounts, fake ids,
and steal money in the middle of a transaction. So to detect this, Feed Forward
Neural network helps us by checking whether it is a genuine transaction or a fraud
transaction.

9. Stock Market trading:

Machine learning is widely used in stock market trading. In the stock market, there
is always a risk of up and downs in shares, so for this machine learning's long short
term memory neural network is used for the prediction of stock market trends.

10. Medical Diagnosis:

In medical science, machine learning is used for diseases diagnoses. With this,
medical technology is growing very fast and able to build 3D models that can predict
the exact position of lesions in the brain.

Machine learning Life cycle

Machine learning life cycle involves seven major steps, which are given below:

o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment

1. Gathering Data:
Data can be collected from various sources such as files, database, internet,
or mobile devices. The quantity and quality of the collected data will determine the
efficiency of the output. The more will be the data, the more accurate will be the
prediction.This step includes the below tasks:
o Identify various data sources
o Collect data
o Integrate the data obtained from different sources
By performing the above task, we get a coherent set of data, also called as a dataset
2.Data preparation

This step can be further divided into two processes:

o Data exploration:
It is used to understand the nature of data that we have to work with. We
need to understand the characteristics, format, and quality of data.A better
understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing:
Now the next step is preprocessing of data for its analysis.

3. Data Wrangling

1. Data wrangling is the process of cleaning and converting raw data into a
useable format.. Cleaning of data is required to address the quality issues.
2. It is not necessary that data we have collected is always of our use as some of
the data may not be useful. In real-world applications, collected data may have
various issues, including:

o Missing Values
o Duplicate data
o Invalid data
o Noise
So, we use various filtering techniques to clean the data.It is mandatory to detect and
remove the above issues because it can negatively affect the quality of the outcome.

4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step
involves:
o Selection of analytical techniques
o Building models
o Review the result
In this step, we take the data and use machine learning algorithms to build the model.
5.Train Model
In this step we train our model to improve its performance for better outcome
of the problem.

• We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns,
rules, and, features.

6. Test Model
• Once our machine learning model has been trained on a given dataset, then
we test the model. In this step, we check for the accuracy of our model by
providing a test dataset to it.
• Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.

7. Deployment
• The last step of machine learning life cycle is deployment, where we deploy
the model in the real-world system.
• If the above-prepared model is producing an accurate result as per our
requirement with acceptable speed, then we deploy the model in the real
system.
• But before deploying the project, we will check whether it is improving its
performance using available data or not.

Types of Machine Learning Systems

There are so many different types of Machine Learning systems that it is useful to
classify them in broad categories, based on the following criteria:
1. Whether or not they are trained with human supervision

a) Supervised
b) Unsuper‐vised
c) Semisupervised or Reinforcement Learning

2.Whether or not they can learn incrementally or continuously on the fly (

a) online learning
b) batch learning

3. Based on how the system generalizes from the training data to make
predictions on new, unseen data.

a) instance-based learning
b) model-based learning

Supervised Learning

1. Definition: Supervised learning is a type of machine learning where the


algorithm is trained on a labeled dataset, consisting of input-output pairs.
2. Input-Output Mapping: The algorithm learns a mapping function from
input features to corresponding output labels based on the provided training
examples.
3. Labeled Training Data: The training data includes examples with both
input features and their corresponding correct output labels, allowing the
algorithm to learn the relationship between inputs and outputs.
4. Goal: The primary goal of supervised learning is to make accurate
predictions or classifications on new, unseen instances based on the patterns
learned from the labeled training data.

The working of Supervised learning can be easily understood by the below example
and diagram:

Types of supervised Machine learning Algorithms:


Advantages of Supervised learning:
o With the help of supervised learning, the model can predict the output on the
basis of prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such
as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:


o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of object.
o

Unsupervised machine learning

Unsupervised Machine Learning:


1. Definition:
• Unsupervised learning is a type of machine learning where the
algorithm is given unlabeled data and tasked with finding patterns,
structures, or relationships within the data on its own.
2. Objective:
• Discover inherent patterns or structures in the data without explicit
guidance in the form of labeled outputs.
3. Common Techniques:
• Clustering: Group similar data points together based on certain
features.
• Dimensionality Reduction: Reduce the number of input features
while retaining essential information.
• Association: Discover relationships or associations among variables
in the dataset.
4. Clustering:
• Definition: Grouping similar data points into clusters based on
similarities.
• Example Algorithms: K-Means, Hierarchical Clustering, DBSCAN.
• Use Cases: Customer segmentation, anomaly detection, image
segmentation.

5. Dimensionality Reduction:
• Definition: Reducing the number of features while preserving
important information.
• Example Algorithms: Principal Component Analysis (PCA), t-
Distributed Stochastic Neighbor Embedding (t-SNE).
• Use Cases: Visualization, feature engineering, noise reduction.
6. Association:
• Definition: Discovering relationships or associations between
variables in the dataset.
• Example Algorithms: Apriori algorithm for frequent itemset mining.
• Use Cases: Market basket analysis, recommendation systems.
7. Evaluation in Unsupervised Learning:
• Evaluation is often more subjective compared to supervised learning.
• Metrics may depend on the specific task; for example, silhouette score
for clustering.
8. Challenges:
• Lack of clear objectives can make evaluation challenging.
• Interpretability of results may be difficult.
9. Applications:
• Anomaly detection, pattern recognition, exploratory data analysis.
10. Considerations:
• The choice of algorithm depends on the nature of the data and the desired
outcome.
• Preprocessing and scaling may still be necessary for certain unsupervised
learning tasks.

Remember, unsupervised learning is about extracting meaningful information from


unlabeled data, and the choice of algorithm depends on the specific goals and
characteristics of the dataset.
| Employee ID | Monthly Income ($) | Years of Experience | Age | Education Level
| Cluster |
|--------------|---------------------|----------------------|-----|------------------|---------|
| 001 | 5,000 |2 | 25 | Bachelor's | Cluster 1 |
| 002 | 6,500 |4 | 30 | Master's | Cluster 2 |
| 003 | 4,000 |1 | 22 | High School | Cluster 1 |
| 004 | 7,200 |5 | 35 | Bachelor's | Cluster 2 |
| 005 | 5,800 |3 | 28 | Master's | Cluster 1 |

Association rule mining

n association rule learning, we typically deal with data in the form of transactions,
where items are purchased together. Let's modify the example to represent a
dataset suitable for association rule learning, specifically for identifying
associations between different employee characteristics:

In association rule learning, we are interested in finding associations between


different features in the transactions. For instance, we might discover rules like
"Employees with a Master's degree tend to have higher monthly income." These
associations are inferred from the patterns observed in the dataset.

After applying an association rule learning algorithm, the results might include
rules such as:
• {Monthly Income ($) > 6000} => {Master's}
• This rule suggests that employees with a monthly income greater than
$6000 are likely to have a Master's degree.
• {Years of Experience < 3} => {High School}
• This rule implies that employees with less than 3 years of experience
are likely to have a High School education.

These rules provide insights into associations between different employee


characteristics based on the patterns observed in the data.

Advantages of Unsupervised Learning


o Unsupervised learning is used for more complex tasks as compared to
supervised learning because, in unsupervised learning, we don't have labeled
input data.
o Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.

Disadvantages of Unsupervised Learning


o Unsupervised learning is intrinsically more difficult than supervised learning
as it does not have corresponding output.
o The result of the unsupervised learning algorithm might be less accurate as
input data is not labeled, and algorithms do not know the exact output in
advance.

instance-based learning: the system learns the examples by heart, then


generalizes to new cases by using a similarity measure to compare them to the
learned examples (or a subset of them).
Model-based learning Another way to generalize from a set of examples is to
build a model of these exam‐ ples and then use that model to make predictions.
This is called model-based learning

Feature Instance-Based Learning Model-Based Learning


Model-Based: Constructs a
Instance-Based: Memorizes general model from the training
Learning training instances and uses data to make predictions on new,
Approach them directly for predictions. unseen instances.
Instance-Based: No explicit Model-Based: Requires an
training phase. The model explicit training phase where the
Training "learns" during the prediction algorithm learns the underlying
Phase phase by comparing new patterns and relationships in the
Feature Instance-Based Learning Model-Based Learning
instances to memorized training data to build a predictive
training instances. model.
Model-Based: Generally uses
Instance-Based: Tends to use less memory as it constructs a
Memory more memory as it memorizes compact representation of the
Usage the entire training dataset. underlying patterns in the data.
Model-Based: May require
Instance-Based: Easily retraining when faced with new
adapts to new data, as it can data, especially if the underlying
Adaptability incorporate new instances patterns have changed
to New Data during the prediction phase. significantly.
Instance-Based: Potentially
slower during prediction,
especially with large datasets, Model-Based: Generally faster
as it compares each new during prediction, as it applies the
Prediction instance to all memorized learned model directly to new
Time training instances. instances.
Instance-Based: Can be
sensitive to noise and outliers Model-Based: May be more
in the training data, as it robust to noise and outliers, as it
Sensitivity to memorizes all instances, aims to capture the overall
Noise including the noisy ones. underlying patterns.
Instance-Based: May
perform well in situations Model-Based: Often performs
where the relationship well when the underlying patterns
between input and output is in the data can be captured by the
Robustness complex and nonlinear. chosen model architecture.
Instance-Based: k-Nearest Model-Based: Linear
Neighbors (KNN), Case- Regression, Decision Trees,
Examples Based Reasoning. Support Vector Machines.
Batch and Online Learning:
Batch learning
In batch learning, the system is incapable of learning incrementally: it must be
trained using all the available data. This will generally take a lot of time and
computing resources, so it is typically done offline. First the system is trained, and
then it is launched into production and runs without learning anymore; it just
applies what it has learned. This is called offline learning.
Feature Online Learning Batch Learning
Learns from new data Learns from the entire
Definition instances one at a time. dataset at once.
Sequential, processes data as Batch-wise, processes data
Data Processing it arrives. in fixed-size chunks.
Training Continuous, updates model Periodic, updates model
Frequency with each new instance. after processing batches.
Memory Lower memory requirement, May require more memory,
Requirement processes on the fly. processes entire batch.
Computational Potentially faster with real- May be slower, especially
Efficiency time updates. with large datasets.
Adaptability to Adapts quickly to concept Slower to adapt, requires
Changes drift or changing data. retraining on new data.
Example Online Gradient Descent, Batch Gradient Descent,
Algorithms Perceptron. Decision Trees.
Streaming data, real-time Offline scenarios, batch
Use Cases applications. processing systems.

Major Challenges Faced By Machine Learning Professionals


There are a lot of challenges that machine learning professionals face to inculcate
ML skills and create an application from scratch.

1. Poor Quality of Data

Data plays a significant role in the machine learning process. One of the
significant issues that machine learning professionals face is the absence of good
quality data. Unclean and noisy data can make the whole process extremely
exhausting.
Inadequate Training Data Insufficient Quantity of Training Data

The major issue that comes while using machine learning algorithms is the lack of
quality as well as quantity of data. Although data plays a vital role in the processing
of machine learning algorithms, many data scientists claim that inadequate data,
noisy data, and unclean data are extremely exhausting the machine learning
algorithms. For example, a simple task requires thousands of sample data, and an
advanced task such as speech or image recognition needs millions of sample data
examples. Further, data quality is also important for the algorithms to work ideally,
but the absence of data quality is also found in Machine Learning applications. Data
quality can be affected by some factors as follows:

o Noisy Data- It is responsible for an inaccurate prediction that affects the


decision as well as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results
obtained in machine learning models. Hence, incorrect data may affect the
accuracy of the results also.
o Generalizing of output data- Sometimes, it is also found that generalizing
output data becomes complex, which results in comparatively poor future
actions.

2.Nonrepresentative Training Data In order to generalize well, it is crucial that


your training data be representative of the new cases you want to generalize
to. This is true whether you use instance-based learning or model-based
learning

2. Underfitting of Training Data

This process occurs when data is unable to establish an accurate relationship


between input and output variables. It simply means trying to fit in undersized
jeans. It signifies the data is too simple to establish a precise relationship. To
overcome this issue:
• Maximize the training time
• Enhance the complexity of the model
• Add more features to the data
• Reduce regular parameters
• Increasing the training time of model
3.Overfitting of Training Data

Overfitting refers to a machine learning model trained with a massive amount of


data that negatively affect its performance. It is like trying to fit in Oversized
jeans. Unfortunately, this is one of the significant issues faced by machine
learning professionals. This means that the algorithm is trained with noisy and
biased data, which will affect its overall performance. Let’s understand this with
the help of an example. Let’s consider a model trained to differentiate between a
cat, a rabbit, a dog, and a tiger. The training data contains 1000 cats, 1000 dogs,
1000 tigers, and 4000 Rabbits. Then there is a considerable probability that it will
identify the cat as a rabbit. In this example, we had a vast amount of data, but it
was biased; hence the prediction was negatively affected.
We can tackle this issue by:
• Analyzing the data with the utmost level of perfection
• Use data augmentation technique
• Remove outliers in the training set
• Select a model with lesser features
Poor-Quality Data Obviously, if your training data is full of errors, outliers, and
noise (e.g., due to poorquality measurements), it will make it harder for the system
to detect the underlying patterns, so your system is less likely to perform well.
Irrelevant Features
Feature selection (selecting the most useful features to train on among existing
features)
• Feature extraction (combining existing features to produce a more useful one—as
we saw earlier, dimensionality reduction algorithms can help)
• Creating new features by gathering new data.

Data Mismatch In some cases, it’s easy to get a large amount of data for training,
but this data proba‐ bly won’t be perfectly representative of the data that will be
used in production. For example, suppose you want to create a mobile app to take
pictures of flowers and automatically determine their species. You can easily
download millions of pictures of flowers on the web, but they won’t be perfectly
representative of the pictures that will actually be taken using the app on a mobile
device
Stastical Learning
• Statistical learning, also known as statistical machine learning, is a
field of study that focuses on developing and utilizing algorithms and
statistical models to analyze and interpret data.
• The primary goal of statistical learning is to make predictions or
inferences based on data, often with an emphasis on understanding
underlying patterns and relationships.
Training, Testing &Validation dataset

Training Set
This is the actual dataset from which a model trains .i.e. the model sees and
learns from this data to predict the outcome or to make the right decisions.
Testing Set
This dataset is independent of the training set but has a somewhat similar type
of probability distribution of classes and is used as a benchmark to evaluate
the model, used only after the training of the model is complete.
Validation Set
The validation set is used to fine-tune the hyperparameters of the model and is
considered a part of the training of the model. The model only sees this data
for evaluation but not for learn from this data
Training Loss and Test Loss
Training Loss
1. Definition:
• The training loss measures how well a machine learning model
performs on the training data. It is calculated using a loss function that
quantifies the difference between the model's predictions and the
actual target values in the training set.
2. Optimization Objective:
• During training, the goal is to minimize the training loss. Optimization
algorithms, such as gradient descent, are used to adjust the model
parameters to achieve this objective.
3. Overfitting:
• A very low training loss might indicate that the model is fitting the
training data too closely, potentially capturing noise and patterns that
do not generalize well to new, unseen data.

Test Loss:
1. Definition:
• The test loss, also known as validation loss, measures how well a
trained model generalizes to new, unseen data. It is calculated using
the same loss function but on a separate dataset that the model has not
seen during training.
2. Generalization:
• The test loss is crucial for assessing the model's ability to generalize.
A low-test loss indicates that the model is making accurate predictions
on data it has never encountered before.
3. Overfitting Detection:
• Comparing the training loss and test loss helps in detecting
overfitting. If the training loss is significantly lower than the test loss,
it suggests overfitting, and adjustments to the model complexity may
be needed.
Trade-off:
• There is often a trade-off between minimizing training loss and
achieving good generalization. Striking the right balance is essential
to prevent overfitting and ensure the model performs well on new
data.

In statistical learning, which encompasses machine learning and statistical


modeling, various tradeoffs need to be considered when designing and selecting
models. Here are some key tradeoffs in statistical learning:
1. Bias-Variance Tradeoff:
• Bias: It represents the error introduced by approximating a real-world
problem, which may be extremely complex, by a much simpler
model. High bias can lead to underfitting.
• Variance: The variability of model prediction for a given data point
which tells us the spread of our data is called the variance of the
model. The model with high variance has a very complex fit to the
training data and thus is not able to fit accurately on the data which it
hasn’t seen before. As a result, such models perform very well on
training data but have high error rates on test data. When a model is
high on variance, it is then said to as Overfitting of Data.
• Tradeoff: There is a tradeoff between bias and variance. Increasing
model complexity (e.g., using a more flexible algorithm or adding
more features) tends to decrease bias but increase variance, and vice
versa.

2. Model Complexity vs. Interpretability:
• Model Complexity: More complex models can capture intricate
patterns in the data but may be harder to interpret.
• Interpretability: Simpler models are usually more interpretable but
might not capture complex relationships in the data.
• Tradeoff: There is a tradeoff between model complexity and
interpretability. Choosing the right level of complexity depends on the
specific goals of the analysis and the need for interpretability.
3. Computational Efficiency vs. Model Complexity:
• Computational Efficiency: Simple models often require less
computational resources.
• Model Complexity: More complex models, especially in deep
learning, may require significant computational power and time.
• Tradeoff: The choice of model complexity should consider
computational efficiency, especially in real-time or resource-
constrained environments.
4. Number of Features vs. Sample Size:
• Number of Features: Increasing the number of features can provide
more information to the model but may lead to overfitting if the
sample size is small.
• Sample Size: Larger samples generally allow for more reliable model
estimation.
• Tradeoff: There is a tradeoff between the number of features and
sample size. In situations with limited data, it's crucial to carefully
select features to avoid overfitting.
5. Supervised vs. Unsupervised Learning:
• Supervised Learning: Requires labeled data for training and
evaluation but can achieve high predictive accuracy.
• Unsupervised Learning: Works with unlabeled data and may
uncover hidden patterns but might lack clear evaluation metrics.
• Tradeoff: The choice between supervised and unsupervised learning
depends on the availability and quality of labeled data and the specific
goals of the analysis.
Risk Statistics
In machine learning, risk statistics are metrics and measures that help assess the
performance and generalization ability of a model.:
1. Accuracy:
• Definition: The proportion of correctly classified instances out of the
total instances.
• Formula:
Accuracy=Number of Correct Predictions
Total Number of Predictions
2. Precision:
• Definition: The ratio of true positives to the sum of true positives and
false positives.
Formula: Precision=True Positives
Truepositives+FalsePositives

3. Recall (Sensitivity or True Positive Rate):


• Definition: The ratio of true positives to the sum of true positives and
false negatives.
• Formula:
Recall=True PositivesTrue Positives + False NegativesRecall=True P
ositives + False NegativesTrue Positives
4. F1 Score:
• Definition: The harmonic mean of precision and recall, providing a
balance between the two metrics.
• Formula:
F1 Score=2×Precision×RecallPrecision + RecallF1 Score=2×Precisio
n + RecallPrecision×Recall

Mean Absolute Error (MAE):

• Definition: MAE measures the average absolute difference between the


predicted values and the actual values. It provides a straightforward measure of
prediction accuracy.
• Interpretation: A lower MAE indicates better model performance.

MAE= mean absolute error

yi= prediction

xi= true value

n= total number of data points

Mean Squared Error (MSE):

• Definition: MSE measures the average squared difference between the predicted
values and the actual values. Squaring the differences penalizes larger errors
more heavily than smaller ones.
• Interpretation: MSE is sensitive to outliers and tends to amplify their impact
+

o yi is the ith observed value.


o ŷi is the corresponding predicted value.
o n = the number of observations.

Root Mean Squared Error (RMSE):

• Definition: RMSE is the square root of the MSE. It provides a measure of the
average magnitude of the errors in the predicted values, in the same units as the
target variable.
• Formula: RMSE=MSERMSE=MSE
• Interpretation: RMSE is more interpretable than MSE as it is in the same units as
the target variable.

Cross-Validation:

• Definition: Cross-validation is a technique used to assess the performance of a


machine learning model by splitting the dataset into multiple subsets (folds). The
model is trained on some folds and evaluated on others, and this process is
repeated multiple times.

Empirical risk minimization

Empirical risk minimization (ERM) is a concept in machine learning where the goal is to find a
model that performs well on the training data. In simple terms, it's about minimizing the error
on the data you have.

1. Empirical Risk:
• Definition: The average error of a model on the training data.
• Example: If you have a dataset of student exam scores and you're
trying to predict grades, the empirical risk would be how well your
model predicts the actual grades for the students in your training set.
2. Minimization:
• Definition: The process of finding the model that minimizes the
empirical risk.
• Example: Adjusting the parameters of your grade prediction model
(like changing weights in a linear regression) to make sure it predicts
the training data grades as accurately as possible.
3. Task:
• Definition: The overall objective you want your model to achieve,
often stated as a minimization problem.
• Example: Your task is to build a model that predicts student grades
accurately. In ERM, you're adjusting the model to minimize the
difference between predicted and actual grades on the training data.
4. Formally, if we have a dataset consisting of input-output pairs (x,y), the
empirical risk of a model f(x) is given by:
R_emp(f) = (1/n) * ∑i L(yi, f(xi))
where n is the size of the dataset,
∑L(yi, f(xi)) is the loss function that measures the discrepancy between the
model's prediction f(xi) and the true output yi for each example i.

Structural Risk Minimization:


Structural risk minimization (SRM) in machine learning is about finding a balance
between fitting the training data well and avoiding overly complex models that might
not generalize well to new, unseen data. Let's break it down:

1. Structural Risk:
• Definition: The risk associated with the complexity of a model.
• Example: If you have a model that can perfectly memorize every student's
grade in your training data (high complexity), it might not generalize well
to new students. The structural risk is the risk of overfitting.
2. Minimization:
• Definition: The process of finding a model that minimizes both training
error and model complexity.
• Example: Adjusting the model parameters to balance accurate predictions
on the training data while avoiding unnecessary complexity that might
lead to overfitting.
3. Task:
• Definition: The overall objective of finding a model that generalizes well
to new, unseen data.
• Example: Your task is not only to predict grades accurately on the training
data but also to ensure that your model's predictions generalize well to
new students, maintaining a balance between fitting the training data and
avoiding overly complex models.

To sum up, structural risk minimization involves finding a model that not only fits the
training data well but also avoids being too complex. This helps prevent overfitting and
ensures that the model performs well on new, unseen data by striking a balance
between accuracy and simplicity.

Regularized risk minimization


Regularized Risk Minimization (RRM) is a principle in machine learning that
involves finding a model that minimizes a regularized version of the expected risk
on unseen data. It is similar to Empirical Risk Minimization (ERM), but with the
addition of a regularization term that penalizes complex models and encourages
simpler ones.
The regularized risk of a model is defined as:
R_risk(f) = E[L(Y, f(X))] + λΩ(f)
where E[L(Y, f(X))] is the expected loss of the model on unseen data,
Ω(f) is a complexity measure of the model, and
λ is a regularization parameter that controls the trade-off between the loss and
complexity terms.
The regularized risk can be minimized by finding the model that achieves the
optimal trade-off between the loss and complexity terms.
RRM is a powerful approach for training machine learning models that can help
prevent overfitting and improve generalization performance on unseen data.
.

You might also like