0% found this document useful (0 votes)

24 views4 pages

APSC 258 Midterm Study Guide

The APSC 258 Midterm Study Guide covers optimization and gradient descent in machine learning, emphasizing the importance of understanding the gradient descent algorithm for training models with large datasets. It explains the use of mean squared error to create a cost function and how gradient descent helps find the global minimum for this function. Additionally, it discusses linear and polynomial regression, including the use of Python's sklearn for implementing these models and the significance of the Bayes Information Criteria for model selection.

Uploaded by

1slickvik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views4 pages

APSC 258 Midterm Study Guide

Uploaded by

1slickvik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

APSC 258 Midterm Study Guide

Optimization & Gradient Descent

Readings: Chapter 3 Machine Learning Refined 2nd Edition.

Video: Visually Explained.

Machine learning algorithms use gradient descent to optimize a cost function for it’s global minimum.
This cost function is created by taking

Importance:
It is known that machine learning requires a vast amount of training and test data in order to learn and
predict consistent results correctly with unknown inputs. For example, an AI mapping and categorizing
images. But the science as to how much data to feed the model to predict correct results relies on the
gradient descent learning algorithm. Therefore, it is imperative that it must be understood.
Let’s start with an example, there is an algorithm in charge with sorting waste at a recycling facility. The
model uses images of bottles and cans as returnable containers, plastic bottles as recyclable materials and
other waste as waste for landfill.
Let each variable define a set containing elements that correspond to a
S_1 is a labeled set containing each image and the respective key.
Each subset of s and o represent

Let s1 denote a set of sample images witha⊂: s1= { b1 , c1 , p 1 , w 1 } ,

Let o 1 denote a set of machine outputs witha⊂: o1={ B1 ,C 1 , P1 , W 1 }
Let a 1 denote a set of scaler subsetless answers :a 1={ b1 , c 1 , p1 , w1 } ,

Let the following function M, define the machine answer process: s1 → M [ s n ] → o 1 ,

Let each letter define a ratio of machine correctness: γ ={ β , ς , ρ , ϖ }

{ |B n|
β=
|b1| }{
, ς ∈ R∨ς=
|C n|
|c 1| }{
, ρ∈ R∨ρ=
| Pn|
| p1| }{
, ϖ ∈ R∨ϖ=
|W n|
|w 1| }
The true correct ratio is defined by the set: a 1 which has a specific scaler value for each corresponding
subset. For example: Of the sample set of images, only 27% of the images are plastics, thus: p1=0.27 .

Thus, we can define the correctness of this specific set in terms of the mean squared error, below:
2 2 2 2
MSE=( b−β ) + ( c−ς ) + ( p−ρ ) + ( w−ϖ )
n
Let set : { s n ,a n ∈ U n } → f (|U n|, n ) =∑ MSE (¿ γ i , ai) ¿
i=0
Thus, by taking a sum of each mean squared error of each sample set of labeled data a cost function is
created. However, due to the nature of this cost function representing multidimensional values we can
use gradients to find the minimum where after n number of data sets of |U n| population are least to
best train the model.

Let θ1 =|U n|, θ2=n , η=scaler

The gradient is an operation placed onto any scaler function that yields a vector pointing in the
direction of greatest change.

Let’s say there are 300 images of a cow, pig and chicken distributed as: cow = 133, pig = 129, chicken =
38. Thus, the ideal ratios for each are: 0.4433, 0.43, 0.126.
So if the machine gives the output for one test as: 0.4,0.4,0.2 yielding, 120 cows 120 pigs and 60
chickens. It’s correctness can be placed into a cost function: f() = (0.4 – 0.4433)^2 …
Now by summing the these cost functions we create a multidimensional cost function that is hard to
visualize or find global minimums via simple derivative. So we use the gradient operator to define a
vector at each point that points to the area of least change and it guides us to the minimum. A small step is
taken in that direction and the process interativly repeats. If at a certain point, the algorithm cannot find
an area where the gradient points anywhere else but a point it is now at the minimum and the number of
input data required to train the model is reached.
This is the gradient descent algorithm.
Linear Regression:
Linear regression in machine learning is identical to the statistical equivalent. Where a line of best fit is
defined through the least squares method, where a linear factor and shift are defined through:

Y =αX + β

[ ]
( ∑ x y −∑ x ∑ y )
i i i i
i i i
α= 2
∑ x (∑ x ) 2
i i
i i

α
β=
n ( ∑ y −∑ x )
i
i
i
i

Where i represents the index of each data points and ∑ x represents the sum of all data points in the
i
index of i . While n represents the total number of data points given. Once these values are found the
linear regression is complete. However, python does not require you to preform all these sums and instead
commands lie in “sklearn” that can do the hard work for you as shown below:

From sklearn.linear_model import LinearRegression

#Regression Model
reg= LinearRegression()
X = […]
Y = […]
#fit
reg.fit(X,Y)
reg.predict([[5.5]])

Through the use of a hyperplane where a linear model lies in multiple dimensions, a
linear regression can be preformed on multiple variables or data sets containing the
common output axis. This new linear function is added to the other and makes no
difference if it doesn’t interfere.

Y =α 1 X 1 + β 1+ α 2 X 2 + β 2+ …+α n X n + β n=∑ α i X i+ βi
i=n

Polynomial Regression
A polynomial regression is a linear prediction model with polynomial terms. This means that we can treat
n
it like a multivariable linear regression and substituting X n=( X 1 ) then solving for α n using the same
equation defined above. Polynomial functions are either even or odd which dictate the shape of function,
thus adding more terms of greater order increases the fit level of the regression. If the order is too large,
overfitting can occur (see figure below at order 17) where small errors can change the shape of the curve
easily which can cause our model to be dependent on the labeled data. However if the order is too small
(see figure below at order 2), underfitting occurs where the regression does not reflect the trend of the
overall data. The Bayes Information Criteria function defined for n data points and k model order
(maximum model order you think). If a certain
polynomial model scores the lowest out of all
other possible models on the BIC function, it is
the most appropriate to use for prediction.

[ ]
n
BI C k =nlog ∑ ( y i− p ( x i ) )2 + klog(n)
i=0

Figure 1: The above figure shows the BIC value for each order for the regression, thus 3 is the minimum.

SharmaJK 2016 Contents OperationsResearchThe
100% (1)
SharmaJK 2016 Contents OperationsResearchThe
100 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
No ratings yet
Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
143 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Regression
No ratings yet
Regression
16 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Machine Learning Summary
No ratings yet
Machine Learning Summary
38 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
Machine Learning Guide
No ratings yet
Machine Learning Guide
185 pages
Module B Handbook
No ratings yet
Module B Handbook
11 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Lecture Slides - Linear Regression (2025)
No ratings yet
Lecture Slides - Linear Regression (2025)
45 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Machine Learning
100% (1)
Machine Learning
185 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Module 3
No ratings yet
Module 3
27 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Lec 3
No ratings yet
Lec 3
22 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Textbook
No ratings yet
Textbook
161 pages
Unit-III Advanced Machine Learning
No ratings yet
Unit-III Advanced Machine Learning
8 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Regression
No ratings yet
Regression
25 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Undergraduate Fundamentals of Machine Learning
No ratings yet
Undergraduate Fundamentals of Machine Learning
163 pages
Foundations of Machine Learning - 3
No ratings yet
Foundations of Machine Learning - 3
38 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Cs181 Textbook
No ratings yet
Cs181 Textbook
163 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Unit 3.1 Gradient Descent in Linear Regression
No ratings yet
Unit 3.1 Gradient Descent in Linear Regression
6 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
GR 1 Report Week 7
No ratings yet
GR 1 Report Week 7
6 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
Modern Pridictive Modelling (Regression)
No ratings yet
Modern Pridictive Modelling (Regression)
12 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Gradient Descent
No ratings yet
Gradient Descent
16 pages
Advanced Machine Learning: Module-1
No ratings yet
Advanced Machine Learning: Module-1
164 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Experiment N1
No ratings yet
Experiment N1
7 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Exercises of Basic Analytical Geometry
From Everand
Exercises of Basic Analytical Geometry
Simone Malacrida
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Mathematical Analysis 1: theory and solved exercises
From Everand
Mathematical Analysis 1: theory and solved exercises
Alessio Mangoni
5/5 (1)
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
Fourier Representation of Signals and LTI Systems
No ratings yet
Fourier Representation of Signals and LTI Systems
45 pages
Exercises in Nonlinear Control Systems
No ratings yet
Exercises in Nonlinear Control Systems
99 pages
Math Basic Grade 10 QP
No ratings yet
Math Basic Grade 10 QP
12 pages
2 Probability Theory - New
No ratings yet
2 Probability Theory - New
88 pages
Solution DPP Indefinite Integration BITSAT Crash Course MathonGo
No ratings yet
Solution DPP Indefinite Integration BITSAT Crash Course MathonGo
7 pages
Increasing and Decreasing Functions: Relative Extrema First Derivative Test
No ratings yet
Increasing and Decreasing Functions: Relative Extrema First Derivative Test
18 pages
Matth S2 2324 - S2 - Sess - Exam - P1 - Question
No ratings yet
Matth S2 2324 - S2 - Sess - Exam - P1 - Question
15 pages
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
No ratings yet
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
14 pages
Tle 8-Module 4
No ratings yet
Tle 8-Module 4
2 pages
The Generation and Display of Normal Maps in 3ds Max
No ratings yet
The Generation and Display of Normal Maps in 3ds Max
12 pages
WorksheetWorks CrossNumber Puzzle 4
No ratings yet
WorksheetWorks CrossNumber Puzzle 4
2 pages
Examples On Mathematical Induction: Divisibility 3: Created by Mr. Francis Hung Last Updated: December 14, 2011
No ratings yet
Examples On Mathematical Induction: Divisibility 3: Created by Mr. Francis Hung Last Updated: December 14, 2011
2 pages
Stat Prob Las 10
No ratings yet
Stat Prob Las 10
6 pages
Assignment7 (Questions)
No ratings yet
Assignment7 (Questions)
3 pages
Unit 1 A Project Summary
No ratings yet
Unit 1 A Project Summary
4 pages
STATICS OF RIGID BODIES Chapter I
No ratings yet
STATICS OF RIGID BODIES Chapter I
23 pages
SP Module Week 6
No ratings yet
SP Module Week 6
27 pages
Grade 10 Quadratic Equations: Answer The Questions
No ratings yet
Grade 10 Quadratic Equations: Answer The Questions
11 pages
99 Percentile Strategy For JEE Main 2020 - MathonGo
No ratings yet
99 Percentile Strategy For JEE Main 2020 - MathonGo
27 pages
July 23 Geometry
No ratings yet
July 23 Geometry
11 pages
The Hardest SAT Math Questions Ever PrepScholar
No ratings yet
The Hardest SAT Math Questions Ever PrepScholar
1 page
Chapter 4: Mathematical Expectation: X XF X E
No ratings yet
Chapter 4: Mathematical Expectation: X XF X E
24 pages
HW2 6-9
No ratings yet
HW2 6-9
2 pages
10-M-Ch-1 To 15-Most Imp Questions For Board 2023
No ratings yet
10-M-Ch-1 To 15-Most Imp Questions For Board 2023
69 pages
Digital Sat k12 Student Weekend 143221959
No ratings yet
Digital Sat k12 Student Weekend 143221959
1 page
Ground Effect Vehicle: Course Project, Guide: Prof. R.P.Shimpi
No ratings yet
Ground Effect Vehicle: Course Project, Guide: Prof. R.P.Shimpi
1 page
Geometry Basics Vocabulary
No ratings yet
Geometry Basics Vocabulary
39 pages
5.3 Problem Set 5 - 3
No ratings yet
5.3 Problem Set 5 - 3
8 pages
Phy-214
No ratings yet
Phy-214
2 pages

APSC 258 Midterm Study Guide

Uploaded by

APSC 258 Midterm Study Guide

Uploaded by

APSC 258 Midterm Study Guide

Optimization & Gradient Descent

Video: Visually Explained.

Let s1 denote a set of sample images witha⊂: s1= { b1 , c1 , p 1 , w 1 } ,

Let the following function M, define the machine answer process: s1 → M [ s n ] → o 1 ,

Let each letter define a ratio of machine correctness: γ ={ β , ς , ρ , ϖ }

Let θ1 =|U n|, θ2=n , η=scaler

From sklearn.linear_model import LinearRegression

You might also like