0% found this document useful (0 votes)
23 views2 pages

Emsemble Methods-Pages-Deleted

The document discusses the importance of building machine learning models that generalize well to new data, highlighting the balance between underfitting and overfitting. It explains bias as the error from overly simplistic models and variance as the error from overly complex models, both of which impact performance. The text outlines the causes of overfitting and underfitting, emphasizing the need for appropriate model complexity and sufficient training data.

Uploaded by

pranay sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views2 pages

Emsemble Methods-Pages-Deleted

The document discusses the importance of building machine learning models that generalize well to new data, highlighting the balance between underfitting and overfitting. It explains bias as the error from overly simplistic models and variance as the error from overly complex models, both of which impact performance. The text outlines the causes of overfitting and underfitting, emphasizing the need for appropriate model complexity and sufficient training data.

Uploaded by

pranay sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Start with the Big Picture

Objective: Explain that in machine learning, our goal is to build models that generalize well to new
data—not just perform perfectly on our training data.

• Introduce the idea of a “model” as a tool that learns patterns from data.

• State the challenge: We must balance between making the model too simple (missing
important patterns) and too complex (capturing noise as if it were a pattern)

ML | Underfitting and Overfitting

Last Updated : 27 Jan, 2025

Machine learning models aim to perform well on both training data and new, unseen data and is
considered “good” if:

1. It learns patterns effectively from the training data.

2. It generalizes well to new, unseen data.

3. It avoids memorizing the training data (overfitting) or failing to capture relevant patterns
(underfitting).

To evaluate how well a model learns and generalizes, we monitor its performance on both the
training data and a separate validation or test dataset which is often measured by
its accuracy or prediction errors. However, achieving this balance can be challenging. Two
common issues that affect a model’s performance and generalization ability
are overfitting and underfitting. These problems are major contributors to poor performance in
machine learning models. Let’s us understand what they are and how they contribute to ML
models.

Bias and Variance in Machine Learning

Bias and variance are two key sources of error in machine learning models that directly impact
their performance and generalization ability.

Bias: is the error that happens when a machine learning model is too simple and doesn’t learn
enough details from the data. It’s like assuming all birds can only be small and fly, so the model fails
to recognize big birds like ostriches or penguins that can’t fly and get biased with predictions.

• These assumptions make the model easier to train but may prevent it from capturing the
underlying complexities of the data.

• High bias typically leads to underfitting, where the model performs poorly on both training
and testing data because it fails to learn enough from the data.

• Example: A linear regression model applied to a dataset with a non-linear relationship.

Variance: Error that happens when a machine learning model learns too much from the data,
including random noise.
• A high-variance model learns not only the patterns but also the noise in the training data,
which leads to poor generalization on unseen data.

• High variance typically leads to overfitting, where the model performs well on training data
but poorly on testing data.

Overfitting and Underfitting: The Core Issues

1. Overfitting in Machine Learning

Overfitting happens when a model learns too much from the training data, including details that
don’t matter (like noise or outliers).

• For example, imagine fitting a very complicated curve to a set of points. The curve will go
through every point, but it won’t represent the actual pattern.

• As a result, the model works great on training data but fails when tested on new data.

Overfitting models are like students who memorize answers instead of understanding the topic.
They do well in practice tests (training) but struggle in real exams (testing).

Reasons for Overfitting:

1. High variance and low bias.

2. The model is too complex.

3. The size of the training data.

2. Underfitting in Machine Learning

Underfitting is the opposite of overfitting. It happens when a model is too simple to capture what’s
going on in the data.

• For example, imagine drawing a straight line to fit points that actually follow a curve. The line
misses most of the pattern.

• In this case, the model doesn’t work well on either the training or testing data.

Underfitting models are like students who don’t study enough. They don’t do well in practice tests
or real exams. Note: The underfitting model has High bias and low variance.

Reasons for Underfitting:

1. The model is too simple, So it may be not capable to represent the complexities in the data.

2. The input features which is used to train the model is not the adequate representations of
underlying factors influencing the target variable.

3. The size of the training dataset used is not enough.

4. Excessive regularization are used to prevent the overfitting, which constraint the model to
capture the data well.

You might also like