0% found this document useful (0 votes)

208 views60 pages

Linear Regression

Linear regression is both a statistical algorithm and a machine learning algorithm. It is used to model the relationship between input variables (x) and an output variable (y) with a linear equation. The coefficients of the linear equation are estimated from the training data using techniques like ordinary least squares and gradient descent to minimize error. Linear regression makes simplifying assumptions about data but provides an easily interpretable model.

Uploaded by

Naresha a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

208 views60 pages

Linear Regression

Uploaded by

Naresha a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Linear Regression for Machine Learning

• One of the most well known and well understood algorithms in

statistics and machine learning

• Do not need to know any statistics or linear algebra to understand

linear regression
• Why linear regression belongs to both statistics and machine learning.
• The many names by which linear regression is known.
• The representation and learning algorithms used to create a linear
regression model.
• How to best prepare your data when modeling using linear
regression.
Isn’t Linear Regression from Statistics?

• Machine learning, more specifically the field of predictive modeling is primarily

concerned with minimizing the error of a model or making the most accurate
predictions possible, at the expense of explainability

• Linear regression was developed in the field of statistics and is studied as a

model for understanding the relationship between input and output numerical
variables, but has been borrowed by machine learning

• It is both a statistical algorithm and a machine learning algorithm

Many Names of Linear Regression

• Linear regression is a linear model, e.g. a model that assumes a linear

relationship between the input variables (x) and the single output
variable (y)

• Simple linear regression

• Multiple linear regression

• Least Squares Regression

Linear Regression Model Representation

• Attractive model because the representation is so simple

• Combines a specific set of input values (x) the solution to which is the
predicted output for that set of input values (y)
• Both the input values (x) and the output value are numeric
• The formula for a regression line is
Y' = bX + A where Y' is the predicted score, b is the slope of the
line, and A is the Y intercept (both called coefficients)
• In higher dimensions when we have more than one input (x), the line
is called a plane or a hyperplane
Linear Regression Learning the Model

• Learning a linear regression model means estimating the values of the

coefficients used in the representation with the data

• Techniques to prepare a linear regression model

• Ordinary Least Squares

• Gradient Descent (most common technique taught from a machine learning

perspective)
Simple Linear Regression

• With simple linear regression when we have a single input, we can

use statistics to estimate the coefficients

• Requires that we calculate statistical properties from the data such as

means, standard deviations, correlations and covariance.

• All of the data must be available to traverse and calculate statistics

Ordinary Least Squares

• When we have more than one input we can use Ordinary Least Squares to
estimate the values of the coefficients

• This procedure seeks to minimize the sum of the squared residuals

• Given a regression line through the data we calculate the distance from each
data point to the regression line, square it, and sum all of the squared errors
together

• This is the quantity that ordinary least squares seeks to minimize

Gradient Descent

• When there are one or more inputs you can use a process of optimizing the values
of the coefficients by iteratively minimizing the error of the model on training data

• This operation is called Gradient Descent and works by starting with zero values for
each coefficient

• The sum of the squared errors are calculated for each pair of input and output
values
• A learning rate is used as a scale factor and the coefficients are
updated in the direction towards minimizing the error

• The process is repeated until a minimum sum squared error is

achieved or no further improvement is possible
Regularized Linear Regression
• There are extensions of the training of the linear model called
regularization methods
• Seek to both minimize the sum of the squared error of the model on
the training data (using Ordinary Least Squares) but also to reduce the
complexity of the model (like the number or absolute size of the sum
of all coefficients in the model)
• Two popular examples of regularization procedures for linear
regression are:
• Lasso Regression:
• Ridge Regression:
• Lasso Regression:
• Ordinary Least Squares is modied to also minimize the absolute sum of the
coecients (called L1 regularization)
• Ridge Regression:
• Ordinary Least Squares is modied to also minimize the squared absolute sum
of the coecients (called L2 regularization)
Preparing Data For Linear Regression
• Linear Assumption
• need to transform data to make the relationship linear
• Remove Noise
• remove outliers in the output variable (y) if possible
• Remove Collinearity
• removing the most correlated data
• Rescale Inputs:
• Rescale input variables using standardization or normalization if exists
Simple Linear Regression
Tutorial
Introduction
• How to calculate a simple linear regression step-by-step.

• How to make predictions on new data using your model.

• A shortcut that greatly simplifes the calculation.

Tutorial Data Set
•x y
•1 1
•2 3
•4 3
•3 2
•5 5
Draw a Scatter plot of x versus y.
Simple Linear Regression
• With simple linear regression we want to model our data as follows:
• y = B0 + B1 x
• This is a line where y is the output variable we want to predict, x is the input
variable we know and B0 and B1 are coefficients that we need to estimate that
move the line around
• Technically, B0 is called the intercept because it determines where the line
intercepts the y-axis
• B1 term is called the slope because it defines the slope of the line or how x
translates into a y value before we add our bias
• Goal is to find the best estimates for the coefficients to minimize the errors in
predicting y from x
• Simple regression is great, because rather than having to search for
values by trial and error or calculate them analytically using more
advanced linear algebra, we can estimate them directly from our data
• To estimate the value for B1

• Where mean() is the average value for the variable in our dataset
• We can calculate B0 using B1 and some statistics from our dataset, as
• Follows
• Find the coefficientsfor simple linear regression equation
• Plot these predictions as a line with the given data.
Estimating Error
• Calculate an error score for the predictions called the Root Mean
Squared Error or RMSE

• p is the predicted value and y is the actual value, i is the index for a
specific instance, because we must calculate the error across all
predicted values
Shortcut
• There is a shortcut that you can use to quickly estimate the values for
B1
Basics of ML
Parametric and Nonparametric
Machine Learning Algorithms

• Assumptions can greatly simplify the learning process, but can also
limit what can be learned.
• Algorithms that simplify the function to a known form are called
parametric machine learning algorithms.
• “A learning model that summarizes data with a set of parameters of
fixed size (independent of the number of training examples) is called a
parametric model. No matter how much data you throw at a
parametric model, it won't change its mind about how many
parameters it needs.”
Nonparametric Machine Learning Algorithms
• Nonparametric methods are good when you have a lot of data and no
prior knowledge, and when you don't want to worry too much about
choosing just the right features
• Nonparametric methods seek to best t the training data in constructing
the mapping function, whilst maintaining some ability to generalize to
unseen data
• An easy to understand nonparametric model is the k-nearest neighbors
algorithm that makes predictions based on the k most similar training
patterns for a new data instance
• The method does not assume anything about the form of the mapping
function other than patterns that are close are likely have a similar
output variable.
Bias-Variance Trade-Off
• In supervised machine learning an algorithm learns a model from
training data
• The goal of any supervised machine learning algorithm is to best
estimate the mapping function (f) for the output variable (Y ) given
the input data (X)
• Mapping function is often called the target function because it is the
function that a given supervised machine learning algorithm aims to
approximate
• The prediction error for any machine learning algorithm can be
broken down into three parts:
• Bias Error
• Variance Error
• Irreducible Error
• Irreducible error cannot be reduced regardless of what algorithm is
used
• It is the error introduced from the chosen framing of the problem and
may be caused by factors like unknown variables that influence the
mapping of the input variables to the output variable
Bias Error

• Bias are the simplifying assumptions made by a model to make the

target function easier to learn

• Parametric algorithms have a high bias making them fast to learn and
easier to understand but generally less flexible

• In turn they have lower predictive performance on complex problems

that fail to meet the simplifying assumptions of the algorithms bias
• Low Bias: Suggests less assumptions about the form of the target
function

• High-Bias: Suggests more assumptions about the form of the target

function

• Low-bias machine learning algorithms include: Decision Trees, k-

Nearest Neighbors and Support Vector Machines

• High-bias machine learning algorithms include: Linear Regression,

Linear Discriminant Analysis and Logistic Regression
Variance Error

• Variance is the amount that the estimate of the target function will change
if different training data was used

• Target function is estimated from the training data by a machine learning

algorithm, so we should expect the algorithm to have some variance

• Ideally, should not change too much from one training dataset to the next

• Machine learning algorithms that have a high variance are strongly

influenced by the specifics of the training data
• Low Variance: Suggests small changes to the estimate of the target function with
changes to the training dataset

• High Variance: Suggests large changes to the estimate of the target function with
changes to the training dataset

• For example decision trees have a high variance

• Examples of low-variance machine learning algorithms include: Linear Regression,

Linear Discriminant Analysis and Logistic Regression

• Examples of high-variance machine learning algorithms include: Decision Trees, k-

Nearest Neighbors and Support Vector Machines
Bias-Variance Trade-Off
• Goal of any supervised machine learning algorithm is to achieve low
bias and low variance
• In turn the algorithm should achieve good prediction performance
• linear machine learning algorithms often have a high bias but a low
variance
• Nonlinear machine learning algorithms often have a low bias but a
high variance
Overfitting and Underfitting

• Generalization in Machine Learning

• In machine learning we describe the learning of the target function from training
data as inductive learning

• Induction refers to learning general concepts from specific examples which is

exactly the problem that supervised machine learning problems aim to solve

• Generalization refers to how well the concepts learned by a machine learning

model apply to specific examples not seen by the model when it was learning
• Goal of a good machine learning model is to generalize well from the
training data to any data from the problem domain

• This allows us to make predictions in the future on data the model has
never seen

• Overfitting and underfitting are the two biggest causes for poor
performance of machine learning algorithms
Statistical Fit

• A fit refers to how well you approximate a target function

• Statistics often describe the goodness of fit to measures used to
estimate how well the approximation of the function matches the
target function
Overfitting in Machine Learning

• Overfitting refers to a model that models the training data too well

• Overfitting happens when a model learns the detail and noise in the training data
to the extent that it negatively impacts the performance on the model on new data

• This means that the noise or random fluctuations in the training data is picked up
and learned as concepts by the model

• The problem is that these concepts do not apply to new data and negatively impact
the models ability to generalize
• Overfitting is more likely with nonparametric and nonlinear models
that have more flexibility when learning a target function
• Decision trees are a nonparametric machine learning algorithm that is
very flexible and is subject to overfitting training data
• This problem can be addressed by pruning a tree after it has learned
in order to remove some of the detail it has picked up
Underfitting in Machine Learning

• Underfitting refers to a model that can neither model the training

data nor generalize to new data
• An underfit machine learning model is not a suitable model and will
be obvious as it will have poor performance on the training data
• Underfitting is often not discussed as it is easy to detect given a good
performance metric
• Remedy is to move on and try alternate machine learning algorithms
A Good Fit in Machine Learning

• Should plot both the skill on the training data and the skill on a test
dataset we have held back from the training process
• Over time, as the algorithm learns, the error for the model on the
training data goes down and so does the error on the test dataset
• If we train for too long, the error on the training dataset may continue
to decrease because the model is overfitting and learning the
irrelevant detail and noise in the training dataset
• At the same time the error for the test set starts to rise again as the
model's ability to generalize decreases
How To Limit Overfitting

• Both overfitting and underfitting can lead to poor model performance

• Most common problem in applied machine learning is overfitting
• Overfitting is such a problem because the evaluation of machine
learning algorithms on training data is different from the evaluation
we actually care the most about, namely how well the algorithm
performs on unseen data
• Two important techniques that can be used when evaluating machine
learning algorithms to limit overfitting:
• Use a resampling technique to estimate model accuracy (k-fold cross-
validation)
• Hold back a validation dataset (validation dataset is simply a subset of your
training data that you hold back from your machine learning algorithms until
the very end of your project)
Gradient Descent For Machine
Learning
• Optimization is a big part of machine learning

• Gradient descent is an optimization algorithm used to find the values of

parameters (coefficients) of a function (f) that minimizes a cost function
(cost)

• Best used when the parameters cannot be calculated analytically (e.g. using
linear algebra) and must be searched for by an optimization algorithm
Gradient Descent Procedure
• Procedure starts of with initial values for the coefficient or coefficients
for the function
coefficient = 0:0
• The cost of the coefficients is evaluated by plugging them into the
function and calculating the cost
cost = f(coefficient)
cost = evaluate(f(coefficient))
• Derivative of the cost is calculated which refers to the slope of the
function at a given point
• We need to know the slope to find the direction (sign) to move the
coefficient values in order to get a lower cost on the next iteration.
delta = derivative(cost)
• We can now update the coefficient values
• A learning rate parameter (alpha) must be specified that controls how
much the coefficients can change on each update
coefficient = coefficient - (alpha x delta)
• Process is repeated until the cost of the coefficients (cost) is 0.0 or no
further improvements in cost can be achieved
Batch Gradient Descent
• Goal of all supervised machine learning algorithms is to best estimate a
target function (f) that maps input data (X) onto output variables (Y )
• Different algorithms have different representations and different
coefficients, but many of them require a process of optimization to find
the set of coefficients that result in the best estimate of the target
function
• Common examples of algorithms with coefficients that can be
optimized using gradient descent are Linear Regression and Logistic
Regression
• Evaluation of how close fit a machine learning model estimates the
target function can be calculated a number of different ways, often
specific to the machine learning algorithm
• Cost function involves evaluating the coefficients in the machine
learning model by calculating a prediction for each training instance in
the dataset and comparing the predictions to the actual output values
then calculating a sum or average error
• From the cost function a derivative can be calculated for each
coefficient so that it can be updated
• Cost is calculated for a machine learning algorithm over the entire
training dataset for each iteration of the gradient descent algorithm
• One iteration of the algorithm is called one batch and this form of
gradient descent is referred to as batch gradient descent
• Batch gradient descent is the most common form of gradient descent
described in machine learning
Stochastic Gradient Descent
• Gradient descent can be slow to run on very large datasets
• One iteration of the gradient descent algorithm requires a prediction
for each instance in the training dataset, hence takes a long time when
there is many millions of instances
• For large amounts of data, variation of gradient descent can be used
called stochastic gradient descent
• In this variation, the gradient descent procedure described above is
run but the update to the coefficients is performed for each training
instance, rather than at the end of the batch of instances
• First step of the procedure requires that the order of the training
dataset is randomized
• Because the coefficients are updated after every training instance, the
updates will be noisy, jumping all over the place, and so will the
corresponding cost function
• By mixing up the order for the updates to the coefficients, it
harnesses this random walk and avoids getting stuck
• Update procedure for the coefficients is the same as that above,
except the cost is not summed or averaged over all training patterns,
but instead calculated for one training pattern
• Learning can be much faster with stochastic gradient descent for very
large training datasets and often need a small number of passes
through the dataset to reach a good or good enough set of
coefficients, e.g. 1-to-10 passes through the dataset
Tips for Gradient Descent

• Plot Cost versus Time

• Expectation for a well performing gradient descent run is a
decrease in cost each iteration
• If it does not decrease, try reducing your learning rate

• Learning Rate
• Value is a small real value such as 0.1, 0.001 or 0.0001
• Rescale Inputs

• Algorithm will reach the minimum cost faster if the shape of the cost function
is not skewed and distorted
• This can be achieved by rescaling all of the input variables (X) to the same
range, such as between 0 and 1

• Few Passes

• Stochastic gradient descent often does not need more than 1-to-10 passes
through the training dataset to converge on good or good enough
coefficients.
• Plot Mean Cost
• The updates for each training dataset instance can result in a noisy plot of
cost over time when using stochastic gradient descent
Performance Measures
Confusion matrix for
Multiple classes
Recall for Label A:
Precision for Label A:

Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
Linear Regression For Machine Learning
100% (1)
Linear Regression For Machine Learning
17 pages
Supervised Learning Essentials
No ratings yet
Supervised Learning Essentials
30 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Regression
No ratings yet
Regression
45 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
36 pages
Understanding Linear Regression Models
No ratings yet
Understanding Linear Regression Models
7 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Regression
No ratings yet
Regression
6 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
A) The Least-Squares Method
No ratings yet
A) The Least-Squares Method
19 pages
Unit - 3 - ML - 24
No ratings yet
Unit - 3 - ML - 24
41 pages
Linear Regression with Python OLS
No ratings yet
Linear Regression with Python OLS
23 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
14 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Machine Learning Questions and Answers For Interview
No ratings yet
Machine Learning Questions and Answers For Interview
20 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Linear Regression Explained
No ratings yet
Linear Regression Explained
26 pages
Unit 2 ML Regression
No ratings yet
Unit 2 ML Regression
46 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
6 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
12 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
RRB - Unit 2 Regresion
No ratings yet
RRB - Unit 2 Regresion
53 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Day.9 SML
No ratings yet
Day.9 SML
23 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
ML 2
No ratings yet
ML 2
155 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Unit 2
No ratings yet
Unit 2
136 pages
Ai ML 3
No ratings yet
Ai ML 3
27 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Data Science
100% (1)
Data Science
14 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Understanding Regression in Supervised Learning
No ratings yet
Understanding Regression in Supervised Learning
25 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
S&ML Unit 5 - Q & A
No ratings yet
S&ML Unit 5 - Q & A
15 pages
Unit 2
No ratings yet
Unit 2
133 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Unit 2
No ratings yet
Unit 2
92 pages
8518diversity and Inclusion - LCS - WBT
No ratings yet
8518diversity and Inclusion - LCS - WBT
28 pages
Python Feature Encoding Guide
No ratings yet
Python Feature Encoding Guide
24 pages
Credit Card Fraud Detection Model Guide
No ratings yet
Credit Card Fraud Detection Model Guide
5 pages
Support Vector Machines
No ratings yet
Support Vector Machines
16 pages
ML Project Docs
No ratings yet
ML Project Docs
45 pages
Unit I: Mathematical Foundations & Introduction 1. Review of Multivariable Calculus
No ratings yet
Unit I: Mathematical Foundations & Introduction 1. Review of Multivariable Calculus
3 pages
Applied ML Notes
No ratings yet
Applied ML Notes
123 pages
STAT318 - Data Mining: Dr. Blair Robertson
No ratings yet
STAT318 - Data Mining: Dr. Blair Robertson
39 pages
Week 7
No ratings yet
Week 7
3 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
Machine Learning - Unit 1 - Introduction - Study Material
No ratings yet
Machine Learning - Unit 1 - Introduction - Study Material
15 pages
UNIT 4 Supervised Learning
No ratings yet
UNIT 4 Supervised Learning
38 pages
Statistical Learning: Classification Overview
No ratings yet
Statistical Learning: Classification Overview
30 pages
Deep Learning Unit-1
No ratings yet
Deep Learning Unit-1
43 pages
Revision Exercise SDSC5001 Midterm
No ratings yet
Revision Exercise SDSC5001 Midterm
4 pages
Bias-Variance Decomposition Explained
No ratings yet
Bias-Variance Decomposition Explained
3 pages
Notes Ai Finals
No ratings yet
Notes Ai Finals
39 pages
Machine Learning Error Types
No ratings yet
Machine Learning Error Types
4 pages
Bias Variance Tradeoff ML
No ratings yet
Bias Variance Tradeoff ML
2 pages
AI, Data Analytics & Cybersecurity Laws
No ratings yet
AI, Data Analytics & Cybersecurity Laws
446 pages
Cross Validation in Machine Learning
No ratings yet
Cross Validation in Machine Learning
4 pages
Regression Analysis Guide
100% (1)
Regression Analysis Guide
280 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
41 Essential Machine Learning Interview Questions: 18 Mins Read
No ratings yet
41 Essential Machine Learning Interview Questions: 18 Mins Read
21 pages
Bias-Variance Tradeoff in ML Models
No ratings yet
Bias-Variance Tradeoff in ML Models
5 pages
Top 170 Machine Learning Interview Questions and Answers (2024) - Reader View
No ratings yet
Top 170 Machine Learning Interview Questions and Answers (2024) - Reader View
51 pages
ML Unit 2
No ratings yet
ML Unit 2
23 pages
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
No ratings yet
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
56 pages
Bias-Variance Tradeoff Presentation
No ratings yet
Bias-Variance Tradeoff Presentation
11 pages
Lectures On Machine Learning
100% (1)
Lectures On Machine Learning
69 pages
Evaluating Machine Learning Models
100% (2)
Evaluating Machine Learning Models
10 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
74 pages
Jamboree Linear Regression Version 2 Jupyter Notebook
No ratings yet
Jamboree Linear Regression Version 2 Jupyter Notebook
12 pages
Recent Trends On Hybrid Modeling For Indust - 2021 - Computers - Chemical Engine
No ratings yet
Recent Trends On Hybrid Modeling For Indust - 2021 - Computers - Chemical Engine
21 pages
Lecture 1 - Introduction To Machine Learning-HO - Ch0
No ratings yet
Lecture 1 - Introduction To Machine Learning-HO - Ch0
44 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

Linear Regression for Machine Learning

• One of the most well known and well understood algorithms in

• Do not need to know any statistics or linear algebra to understand

• Machine learning, more specifically the field of predictive modeling is primarily

• Linear regression was developed in the field of statistics and is studied as a

• It is both a statistical algorithm and a machine learning algorithm

• Linear regression is a linear model, e.g. a model that assumes a linear

• Simple linear regression

• Multiple linear regression

• Least Squares Regression

• Attractive model because the representation is so simple

• Learning a linear regression model means estimating the values of the

• Techniques to prepare a linear regression model

• Gradient Descent (most common technique taught from a machine learning

• With simple linear regression when we have a single input, we can

• Requires that we calculate statistical properties from the data such as

• All of the data must be available to traverse and calculate statistics

• This procedure seeks to minimize the sum of the squared residuals

• This is the quantity that ordinary least squares seeks to minimize

• The process is repeated until a minimum sum squared error is

• How to make predictions on new data using your model.

• A shortcut that greatly simplifes the calculation.

• Bias are the simplifying assumptions made by a model to make the

• In turn they have lower predictive performance on complex problems

• High-Bias: Suggests more assumptions about the form of the target

• Low-bias machine learning algorithms include: Decision Trees, k-

• High-bias machine learning algorithms include: Linear Regression,

• Target function is estimated from the training data by a machine learning

• Machine learning algorithms that have a high variance are strongly

• For example decision trees have a high variance

• Examples of low-variance machine learning algorithms include: Linear Regression,

• Examples of high-variance machine learning algorithms include: Decision Trees, k-

• Generalization in Machine Learning

• Induction refers to learning general concepts from specific examples which is

• Generalization refers to how well the concepts learned by a machine learning

• A fit refers to how well you approximate a target function

• Underfitting refers to a model that can neither model the training

• Both overfitting and underfitting can lead to poor model performance

• Gradient descent is an optimization algorithm used to find the values of

• Plot Cost versus Time

You might also like