The Problem of Overfitting - Coursera

The document discusses the problem of overfitting in machine learning models. It provides examples showing that while adding more features can improve a model's fit to training data, it can also result in overfitting where the model learns the random noise in the training data and fails to generalize to new examples. Underfitting occurs when the model is too simple to capture patterns in the data, while overfitting happens when the model is too complex and fits the random noise. The main ways to address overfitting are reducing the number of features or collecting more training data.

Uploaded by

Brian Ramiro Oporto Quispe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

199 views1 page

The Problem of Overfitting - Coursera

Uploaded by

Brian Ramiro Oporto Quispe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

The Problem of Overfitting | Coursera https://www.coursera.org/learn/machine-learning/supplement/VTe37/th...

Volver a la semana 3 Lecciones This Course: Aprendizaje Automático Anterior Siguiente

The Problem of Overﬁtting

Consider the problem of predicting y from x ∈ R. The leftmost figure below shows
the result of fitting a y = θ0 + θ1 x to a dataset. We see that the data doesn’t really
lie on straight line, and so the fit is not very good.

Instead, if we had added an extra feature x2 , and ﬁt y = θ0 + θ1 x + θ2 x2 , then

we obtain a slightly better fit to the data (See middle figure). Naively, it might seem
that the more features we add, the better. However, there is also a danger in
adding too many features: The rightmost figure is the result of fitting a 5th order
5
polynomial y = ∑j=0 θj xj . We see that even though the fitted curve passes
through the data perfectly, we would not expect this to be a very good predictor
of, say, housing prices (y) for different living areas (x). Without formally defining
what these terms mean, we’ll say the figure on the left shows an instance of
underfitting—in which the data clearly shows structure not captured by the
model—and the figure on the right is an example of overfitting.

Underfitting, or high bias, is when the form of our hypothesis function h maps
poorly to the trend of the data. It is usually caused by a function that is too simple
or uses too few features. At the other extreme, overfitting, or high variance, is
caused by a hypothesis function that fits the available data but does not generalize
well to predict new data. It is usually caused by a complicated function that creates
a lot of unnecessary curves and angles unrelated to the data.

This terminology is applied to both linear and logistic regression. There are two
main options to address the issue of overﬁtting:

1) Reduce the number of features:

Manually select which features to keep.

Use a model selection algorithm (studied later in the course).

1 de 1 31/05/2018 2:51

ISYE 2028 Chapter 8 Solutions
100% (2)
ISYE 2028 Chapter 8 Solutions
41 pages
ME1750: Medical Transcription (4-12-2021) Section 112
No ratings yet
ME1750: Medical Transcription (4-12-2021) Section 112
5 pages
Question Bank-Unit 1 Numerical Methods: MATH 2300 B.Tech, III Sem
No ratings yet
Question Bank-Unit 1 Numerical Methods: MATH 2300 B.Tech, III Sem
4 pages
Review of Continuity - Continuity - 18.01
No ratings yet
Review of Continuity - Continuity - 18.01
3 pages
Quanti Finals Zara 2
No ratings yet
Quanti Finals Zara 2
9 pages
Algebra Problems - Choose Your Calculus Adventure - 18.01
No ratings yet
Algebra Problems - Choose Your Calculus Adventure - 18.01
3 pages
Activity 2 Flowchart and Excel VBA - Linear Interpolation Extrapolation PDF
No ratings yet
Activity 2 Flowchart and Excel VBA - Linear Interpolation Extrapolation PDF
38 pages
Quiz Submissions - Quiz 7-Chapters 12 and 16-PRACTICE ONLY: Santosh Sankaran (Username: S - Sankaran) Attempt 1
No ratings yet
Quiz Submissions - Quiz 7-Chapters 12 and 16-PRACTICE ONLY: Santosh Sankaran (Username: S - Sankaran) Attempt 1
8 pages
DMDW-Solution For Unit 1-5
50% (2)
DMDW-Solution For Unit 1-5
20 pages
Assignment-4 Noc18 cs52 87
No ratings yet
Assignment-4 Noc18 cs52 87
9 pages
Rectified Linear Units (ReLU) in Deep Learning - Kaggle
No ratings yet
Rectified Linear Units (ReLU) in Deep Learning - Kaggle
3 pages
Ite 6102 Final 2 PDF
No ratings yet
Ite 6102 Final 2 PDF
2 pages
Strategic Approach To Software Testing
No ratings yet
Strategic Approach To Software Testing
6 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Acceleration, Dynamic Force and The Moment Create...
No ratings yet
Acceleration, Dynamic Force and The Moment Create...
5 pages
Rayleigh Model
No ratings yet
Rayleigh Model
9 pages
(Time: Hours) (Marks: 75) Please Check Whether You Have Got The Right Question Paper
No ratings yet
(Time: Hours) (Marks: 75) Please Check Whether You Have Got The Right Question Paper
16 pages
MCQ Test On Unit 6.1 - Attempt Review
No ratings yet
MCQ Test On Unit 6.1 - Attempt Review
3 pages
Ma2262 Probability and Queuing Theory Question Bank Download
No ratings yet
Ma2262 Probability and Queuing Theory Question Bank Download
4 pages
Mod 2 GRMN Amigo Assessment
No ratings yet
Mod 2 GRMN Amigo Assessment
19 pages
CpyProbStatSection PDF
No ratings yet
CpyProbStatSection PDF
240 pages
ML LAB Rec
No ratings yet
ML LAB Rec
9 pages
Nonlinear Multi-Objective Optimization by Kaisa Miettinen
No ratings yet
Nonlinear Multi-Objective Optimization by Kaisa Miettinen
37 pages
In House Project Report - Beg
No ratings yet
In House Project Report - Beg
8 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
IAT-I Question For MA3391 - P & S
No ratings yet
IAT-I Question For MA3391 - P & S
4 pages
Faculty of Engineering Scit B. Tech It/Cse/Cce VI Semester First Mid Term Examination: 2021-22 Data Mining and Warehousing (IT3240)
No ratings yet
Faculty of Engineering Scit B. Tech It/Cse/Cce VI Semester First Mid Term Examination: 2021-22 Data Mining and Warehousing (IT3240)
2 pages
Logistic Regression in R
No ratings yet
Logistic Regression in R
19 pages
Indefinite Integration Export PDF
No ratings yet
Indefinite Integration Export PDF
11 pages
Aashima Aaashima - Be19@thapar - Edu: Quiz II - UCS415
No ratings yet
Aashima Aaashima - Be19@thapar - Edu: Quiz II - UCS415
8 pages
Data Science Laboratory Lab Manual: Prepared by Dr. R Obulakonda Reddy, Associate Professor
No ratings yet
Data Science Laboratory Lab Manual: Prepared by Dr. R Obulakonda Reddy, Associate Professor
35 pages
Advanced Numerical Analysis
No ratings yet
Advanced Numerical Analysis
2 pages
Parameter Estimation
100% (1)
Parameter Estimation
24 pages
Objective Questions
No ratings yet
Objective Questions
40 pages
App SRM Unit 4 Notes
No ratings yet
App SRM Unit 4 Notes
48 pages
Assignment 2 Questions One
No ratings yet
Assignment 2 Questions One
2 pages
Halstead's Operators and Operands in C, C++, JAVA (By Indranil Nandy)
100% (6)
Halstead's Operators and Operands in C, C++, JAVA (By Indranil Nandy)
5 pages
Probability Density Function
No ratings yet
Probability Density Function
19 pages
Maths Notes
No ratings yet
Maths Notes
195 pages
ML Notes Updated
No ratings yet
ML Notes Updated
60 pages
Lab Manual Soft Computing
No ratings yet
Lab Manual Soft Computing
44 pages
Student Feedback Management System Project Report
No ratings yet
Student Feedback Management System Project Report
53 pages
Study Materials - Restricted Boltzmann Machine
No ratings yet
Study Materials - Restricted Boltzmann Machine
6 pages
Pyramid and Pyramid Blending
100% (1)
Pyramid and Pyramid Blending
8 pages
OPR Cheat Sheet: Graphical Method
No ratings yet
OPR Cheat Sheet: Graphical Method
3 pages
Predictive Analytics Notes
No ratings yet
Predictive Analytics Notes
42 pages
Lecture 6 - State Space Search - Uninformed Search
No ratings yet
Lecture 6 - State Space Search - Uninformed Search
43 pages
MATH2045: Vector Calculus & Complex Variable Theory
100% (2)
MATH2045: Vector Calculus & Complex Variable Theory
50 pages
Data Mining Comprehensive Exam - Regular PDF
No ratings yet
Data Mining Comprehensive Exam - Regular PDF
3 pages
Markov Analysis
100% (1)
Markov Analysis
4 pages
Course Outline MTH-375
No ratings yet
Course Outline MTH-375
5 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Internship Presentation
No ratings yet
Internship Presentation
30 pages
AL3391-AI Unit IV
No ratings yet
AL3391-AI Unit IV
65 pages
Relative Grading System New
No ratings yet
Relative Grading System New
2 pages
OVERFITTING and UNDERFITTING
No ratings yet
OVERFITTING and UNDERFITTING
5 pages
Data Science Concepts Overfitting Underfitting
No ratings yet
Data Science Concepts Overfitting Underfitting
8 pages
Bias and Variance
No ratings yet
Bias and Variance
4 pages
Occam's Razor: A Priori
No ratings yet
Occam's Razor: A Priori
4 pages
Regularization
No ratings yet
Regularization
3 pages
Lecture03b Overfitting
No ratings yet
Lecture03b Overfitting
5 pages
Lesson 1 Basic Concepts in Statistics
No ratings yet
Lesson 1 Basic Concepts in Statistics
4 pages
Survival Analysis For Cache Time-To-Live Optimization Presentation
No ratings yet
Survival Analysis For Cache Time-To-Live Optimization Presentation
27 pages
Probability and Statistics (Final Sample)
0% (1)
Probability and Statistics (Final Sample)
25 pages
A Study To Assess The Knowledge and Level of Assertiveness Among B.Sc. Nursing Students in A Selected Nursing College, Kanpur U.P.
No ratings yet
A Study To Assess The Knowledge and Level of Assertiveness Among B.Sc. Nursing Students in A Selected Nursing College, Kanpur U.P.
11 pages
Vector Error Correction Model
No ratings yet
Vector Error Correction Model
13 pages
Multiple Regression Analysis: DR Hédi Essid
No ratings yet
Multiple Regression Analysis: DR Hédi Essid
23 pages
Final - Finance and Acc STATA Assignment
No ratings yet
Final - Finance and Acc STATA Assignment
22 pages
Effectech - Calibration Gases
No ratings yet
Effectech - Calibration Gases
36 pages
Ch14 ZKH3 Multiple Regression
No ratings yet
Ch14 ZKH3 Multiple Regression
45 pages
Usenixsecurity23 Carlini
No ratings yet
Usenixsecurity23 Carlini
19 pages
Module 1 Introduction To Statistical Concepts
No ratings yet
Module 1 Introduction To Statistical Concepts
24 pages
Probabilistic Graphical Models Homework Solutions
100% (2)
Probabilistic Graphical Models Homework Solutions
6 pages
Chapter Three Mean Difference Analysis (T-Test, Analysis of Variance)
No ratings yet
Chapter Three Mean Difference Analysis (T-Test, Analysis of Variance)
55 pages
SPSS Project
0% (1)
SPSS Project
12 pages
Davison Hinkley Bootstrap Methods and Their Application
No ratings yet
Davison Hinkley Bootstrap Methods and Their Application
596 pages
对高斯分布函数形式的推导
No ratings yet
对高斯分布函数形式的推导
4 pages
Tpe 2 Mixed Methods Portfolio (Grupal)
No ratings yet
Tpe 2 Mixed Methods Portfolio (Grupal)
7 pages
Gelo, Braakmann & Benetka (2008) Beyond The Debate of Quantitative-Qualitative
100% (1)
Gelo, Braakmann & Benetka (2008) Beyond The Debate of Quantitative-Qualitative
25 pages
Assn 3
No ratings yet
Assn 3
8 pages
Assignment2 Dhairya - Shah
No ratings yet
Assignment2 Dhairya - Shah
7 pages
Statistic 1 Nalanda
No ratings yet
Statistic 1 Nalanda
44 pages
MC Multiple Regression
No ratings yet
MC Multiple Regression
7 pages
Bivariate Analysis
No ratings yet
Bivariate Analysis
46 pages
Babbie CH 16
No ratings yet
Babbie CH 16
65 pages
Ec692 Final With Answers
No ratings yet
Ec692 Final With Answers
40 pages
Chebyshev's Inequality:: K K K K
No ratings yet
Chebyshev's Inequality:: K K K K
12 pages
Indian Statistical Institute: Student's Brochure
No ratings yet
Indian Statistical Institute: Student's Brochure
47 pages
Inquiries, Investigation, and Immersion: - Third Quarter - Module 1: Finding The Answers To The Research Questions
100% (1)
Inquiries, Investigation, and Immersion: - Third Quarter - Module 1: Finding The Answers To The Research Questions
18 pages
PSR - Module4 - Test of Significance
No ratings yet
PSR - Module4 - Test of Significance
53 pages