0% found this document useful (0 votes)

46 views14 pages

Machine Learning Shortnote

This document provides an overview of machine learning and cognitive science. It discusses how cognitive science examines the mind and its processes through an interdisciplinary approach. Machine learning is introduced as the study of algorithms that allow computer systems to improve at tasks through experience. Common machine learning algorithms discussed include supervised learning methods like classification and regression as well as unsupervised learning. Linear regression is explained as a technique for fitting linear models to data. The gradient descent algorithm and normal equation methods for learning the parameters of a linear regression model are also introduced.

Uploaded by

lahiru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views14 pages

Machine Learning Shortnote

Uploaded by

lahiru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture 01

Cognitive Science ~ Wikipedia

• Cognitive science is the interdisciplinary,
scientific study of the mind and its
processes. It examines the nature, the
tasks, and the functions of cognition (in a
broad sense).
• Cognitive scientists study intelligence and Machine Learning
behavior, with a focus on how nervous “Machine learning (ML) is the study of
systems represent, process, and transform algorithms and statistical models that
information. Mental faculties of concern to computer systems use to progressively
cognitive scientists include language, improve their performance on a specific
perception, memory, attention, reasoning, task.” ~ Wikipedia
and emotion; to understand these faculties, • Learning in Artificial Context
cognitive scientists borrow from fields such • Data resembles the experience
as linguistics, psychology, artificial •Computational Models to represent
intelligence, philosophy, neuroscience, and Knowledge
anthropology.
Tom Mitchell (1998)
A computer program is said to learn from
experience E with respect to some task T
and some performance measure P, if its
performance on T, as measured by P,
improves with experience E.
Ex.
• E – Emails
• T – Recognizing spams
• P – Accuracy of spam email recognition
“ - Jeff Hawkins - ”
The brain uses vast amounts of memory to Machine Learning Applications
create a model of the world. Everything you • Search Engines – Appropriate results
know and have learned is stored in this • Mail services – Spam filters
model. The brain uses this memory-based • Handwritten character recognition
model to make continuous predictions of • Recommendations
future events. It is the ability to make • Word suggestions in Keyboards
predictions about the future that is the crux • Robots / Autonomous Vehicles
of intelligence. • Natural Language Processing

Intelligence and Learning

• Intelligence is the ability to learn from
experience, solve problems, and use our
knowledge to adapt to new situations.
• Pattern recognition
Machine Learning Algorithms
Two main broader classes
1. Supervised Learning
2. Unsupervised Learning
3. Semi-Supervised
4. Reinforcement Learning

Supervised Learning
Algorithm learns to map a given input to a
Single data item with multiple attributes desired output, so it can map an unforeseen
We call it input to an output.
• Samples • Spam filters
• Data Points • Handwritten character recognition
• Data Vectors • Credit limit
• Data Items
Supervised Learning Problems
• Regression Problems
- Real Valued Output
• Classification Problems
- Categorical / Discrete Output

Classification or Regression ?
• Predicting the salary of a fresh graduate
• Email – Spam or Not
• Tumor – Cancer or Not
•Handwritten character recognition
• Predicting weather – Sunny, Rainy,
Cloudy, Windy

Unsupervised Learning
Algorithm learns without having desired output
for a given input. It just attempts to find out the
structure within the data set (identify similarities
and divisions among data).

Machine Learning Algorithms

• Predictions/ Forecast Semi-supervised Learning
• Anomaly detections • Part of the data is with desired output and the
• Clustering rest is without desired output
Lecture 02
Linear Regression
- Univariate Linear Regression

• How to decide parameters 𝜃0 and 𝜃1 ?

• 𝑔 (𝑥) is the final hypothesis where ℎ(𝑥) can
be any hypothesis from the hypothesis set
Understanding the Problem Learning • Intuition – ℎ(x) is close to 𝑦 for training
Approach ? data (x, 𝑦)
(Supervised or Unsupervised) • Find parameters which gives the minimum
• Supervised error 𝐸𝑟𝑟𝑜𝑟 = (ℎ(𝑥) − 𝑦) 2
Learning Regression or Classification ?
• Regression Linear Regression
~ Fitting a Straight Line to the Data ~ • Select 𝜃 0 and 𝜃 1
when Sum of squared error for all the data
Supervised Learning - Training Data points of a particular hypothesis should be
minimized.

Input pairs of data

• 𝑥 – GPA
• 𝑦 – Salary

Hypothesis and Cost Function

• Our hypothesis is a function of 𝑥,
ℎ(𝑥)
• Our cost function is a function of 𝜃0 𝑎𝑛𝑑 𝜃1
,
𝐽(𝜃0,𝜃1)
Linear Functions
• Linear relationship between input and
output
Gradient Descent Algorithm
Learning Process
1. 𝑟𝑒𝑝𝑒𝑎𝑡
• What we want is an algorithmic approach
2. 𝜃 𝑗 ≔ 𝜃 𝑗 − 𝛼 (𝜕 /𝜕𝜃𝑗 )𝐽(𝜃 0 , 𝜃 1 )
to find this 𝜃 0 and 𝜃 1 when the cost is
3. 𝑢𝑛𝑡𝑖𝑙 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒
minimized.
• Applicable for any number of 𝜃 𝑗
• This is what we called learning in this
• 𝛼 - Learning Rate (fractional number -
problem
defines the step size)
Gradient Descent Algorithm
• Start with any 𝜃 0 and 𝜃 1 value
• Repeat till find a minimum
- Change 𝜃 0 and 𝜃 1 in a way that it
decreases 𝐽 𝜃 0 , 𝜃 1
• Initialization of 𝜃 0 and 𝜃 1
-Random values
-Zeros
Linear Regression
This is a broader class
• Fit a line
• Fit a curve
• Fit a periodic function
It is not only about straight line
• Model is linear in 𝜃 parameters, That’s why
it is called linear regression

Lecture 03
Regression
• A statistical method
• Attempts to determine the strength /
relationships between one dependent
variable (usually denoted by Y) and a series
of independent variables (usually x).

Notation
•𝑥 – input (multivariate data point/ vector)
•𝑑 – number of features (dimensions)
•𝑥𝑖 – 𝑖 th data point
•𝑥𝑖 𝑘 – 𝑘 th feature of 𝑖 th data point

Convergence
• Plot cost function and identify error
minimized and stabled
• Predefined tolerance level for cost change
Normal Equations
• Gradient Descent – Iterative process.
• Normal Equations – Solve a mathematical
equation and finds optimal values for
parameters (𝜃 𝑘 ).
• Directly finds the value of 𝜃 without
Gradient Descent iterative process.
• Effective when the data set has less
features.
• Take the partial derivative for each 𝜃 𝑘 and
equals it to zero.
• Solve the equation and find 𝜃 𝑘 for each
parameter
Normal Equation Method
• Construct the matrix 𝑋 from the sample
data
• Construct the vector 𝑦 from the available
labels
• Compute the 𝑝𝑠𝑒𝑢𝑑𝑜 𝑖𝑛𝑣𝑒𝑟𝑠𝑒 of 𝑋
• Compute 𝜃 by
𝜃 = (𝑋 𝑇 𝑋) −1
𝑋 𝑇 𝑦

Gradient Descent Vs Normal Eq.

• Learning rate in Gradient Descent
• Gradient Descent requires a number of
iterations
• Matrix inversion calculation is a
computationally intensive task, so NE
method may be computationally inefficient if
the number of features is relatively large.
• NE method can be used to initialize
parameters for classification counterpart of
linear problems

Logistic Regression
Linear Regression for Classification
• In our linear regression model, the
hypothesis is, 𝑔(𝑥) = 𝜃 𝑇 𝑥
• It gives real valued output for any given
input.
• Linear regression doesn’t suit classification
problems.

Classification
• What we want is a hypothesis that restricts
the output. 0 ≤ ℎ𝜃(𝑥) ≤ 1
• 0 − 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝐶𝑙𝑎𝑠𝑠
• 1 − 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝐶𝑙𝑎𝑠𝑠

Logistic Regression
• Uses the linear regression hypothesis ℎ(𝑥)
• Apply restrictive function on 𝑔(𝑥)
• It will be the classification hypothesis 𝑘(𝑥)
𝑘(𝑥)= 𝑔(ℎ(𝑥))
• We use 𝑔 − 𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝐹𝑢𝑛𝑐𝑡𝑖𝑜n
Logistic Function
• Always provides output 0 ≤ 𝑠𝜃(𝑥) ≤ 1
• If 𝑠𝜃(𝑥) = 0.82
• Can use a threshold 0.5 and
declare it is class 1
• Can state probability of being class
1 is 82%
• Probability of being class 0 is 18%
correct value which you try to predict.
Underfit
• Variance : how much your hypothesis
changes from the target function if the
training data set is changed. Overfit

Avoiding Overfitting
• Reduce number of features
• Manual feature reduction
• Model selection algorithm
• Algorithms to select features and
get rid of some features
• Regularization
• No feature reduction

Regularization
• Keep all features
• Reduce magnitude of parameters/ weights
• It has been proven parameters with
reduced magnitude makes a function
Lecture 04 smother
Regularization • Work well with high dimensional data
Underfitting/ Overfitting • Regularization helps when,
• Underfitting – High Bias • Each dimension contribute to the
• Just Fit – Quadratic Function might just fit final output
• Overfitting – High Variance • Not sure the influence of dimension
to the final output
Bias – Variance Tradeoff
• Approximation vs Generalization
• Bias : the difference between the average
prediction of your hypothesis and the
𝜆 in the Equation
• If 𝜆 is very large
• Penalizes all the 𝜃 parameters and
Regularization brings near zero
• When smaller the parameter values, • Hypothesis becomes flat
• Simpler hypothesis • Underfit – High bias
• Less prone to overfitting • If 𝜆 is very small
• With high dimensional data it is difficult to • Almost no influence over 𝜃
identify which parameter should be parameters
penalized. • Hypothesis tend to overfit
• Thus, with regularization we attempt to • High Variance
shrink all the parameters.
ANN - Artificial Neural Networks
Why ?
• Difficulty of modelling complex hypotheses
with linear models.
• Polynomial Curve Fitting difficult with a
large number of features.
Activation Function: Thresholding
• Thresholding makes harsh decision
• Smoother transition is preferred
• Other Activation Functions
• Sigmoid functions • Relu
Note
• Activations of layer N will be the input
vector for the layer N+1.
• Activations of layer N+1 is a function of
activations of layer N.
• Activations of layer N+1 is function of
original input vector.
• (When N = 1, which is a special case and
activation of Layer 1 is simply regarded as
the input vector itself)
Lecture 05

Artificial Neural Networks

• Backpropagation Algorithm

ANN - Training Complexity

•Weight adjustment
• Different Layers
• Error minimization
• Cost
• Cost contribution – Each Layer /
Each Neuron

Machine Learning Guide 2017
No ratings yet
Machine Learning Guide 2017
15 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
ML - Mca
No ratings yet
ML - Mca
48 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
17 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
8 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
No ratings yet
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
10 pages
Alok Bishoyi's Insights on Machine Learning
100% (1)
Alok Bishoyi's Insights on Machine Learning
7 pages
ML 01
No ratings yet
ML 01
24 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Machine Learning Basics and Techniques
No ratings yet
Machine Learning Basics and Techniques
56 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
MLT Study
No ratings yet
MLT Study
22 pages
Modelling
No ratings yet
Modelling
69 pages
ML Notes
No ratings yet
ML Notes
14 pages
Algorithms For Data Science: Attendance: 88772147
No ratings yet
Algorithms For Data Science: Attendance: 88772147
35 pages
Fileml
No ratings yet
Fileml
54 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
38 pages
Final ML
No ratings yet
Final ML
54 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Intro to Machine Learning Concepts
100% (1)
Intro to Machine Learning Concepts
58 pages
Machine Learning: Spam Filtering & Regression
No ratings yet
Machine Learning: Spam Filtering & Regression
8 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Mod 1
No ratings yet
Mod 1
99 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
w01 LectureSlices MA4550
No ratings yet
w01 LectureSlices MA4550
36 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Regression 0
No ratings yet
Regression 0
108 pages
(MLP) MidtermNote
No ratings yet
(MLP) MidtermNote
31 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
ENG522 Quizzes For Mids
No ratings yet
ENG522 Quizzes For Mids
9 pages
Python Code Challenges
100% (4)
Python Code Challenges
25 pages
An Immodest Proposal For Literary Studies
No ratings yet
An Immodest Proposal For Literary Studies
10 pages
0 - Liza Ordonez CV 2018 PDF
No ratings yet
0 - Liza Ordonez CV 2018 PDF
1 page
Math Lab 007
No ratings yet
Math Lab 007
8 pages
VHF Transceiver Service Guide
No ratings yet
VHF Transceiver Service Guide
37 pages
Duncans Substation Busworks Drawings
No ratings yet
Duncans Substation Busworks Drawings
6 pages
Aashto T 27 T 11 2012
No ratings yet
Aashto T 27 T 11 2012
18 pages
ElectroVoice SBA760 2
No ratings yet
ElectroVoice SBA760 2
2 pages
Peer Eval Rubrics For LITERARY MAGZINE
No ratings yet
Peer Eval Rubrics For LITERARY MAGZINE
2 pages
GEC 08 Ethics Intro pt.1
100% (2)
GEC 08 Ethics Intro pt.1
15 pages
InsufineMineral TDS
No ratings yet
InsufineMineral TDS
2 pages
Introduction to Machine Translation
No ratings yet
Introduction to Machine Translation
2 pages
Connect 4 Game Programming Guide
No ratings yet
Connect 4 Game Programming Guide
10 pages
Reading
No ratings yet
Reading
2 pages
Research Proposal Outline Template
No ratings yet
Research Proposal Outline Template
17 pages
LEMD-LIME Operational Flight Plan
No ratings yet
LEMD-LIME Operational Flight Plan
106 pages
PZ Service Manual
100% (6)
PZ Service Manual
40 pages
Guidelines For FYP PDF
No ratings yet
Guidelines For FYP PDF
10 pages
KATALOG BAKELITE VISALUX New
No ratings yet
KATALOG BAKELITE VISALUX New
6 pages
Ternary Computer Architecture Overview
No ratings yet
Ternary Computer Architecture Overview
192 pages
CHEMISTRY PRACTICAL Assignments
No ratings yet
CHEMISTRY PRACTICAL Assignments
8 pages
Cambridge Primary Checkpoint
100% (2)
Cambridge Primary Checkpoint
10 pages
Accounting Systems in Pharma Conversion Cycle
No ratings yet
Accounting Systems in Pharma Conversion Cycle
8 pages
Equipment Proposal & Quotation: Sabaleta Hydroelectric Project
No ratings yet
Equipment Proposal & Quotation: Sabaleta Hydroelectric Project
23 pages
First Toa Payoh Primary Updates
No ratings yet
First Toa Payoh Primary Updates
7 pages
LinkedList, Jagged Array, List, and Enumeration in Java
No ratings yet
LinkedList, Jagged Array, List, and Enumeration in Java
29 pages
1116-Article Text-11948-2-10-20240625
No ratings yet
1116-Article Text-11948-2-10-20240625
14 pages
Literature's View on Professions
No ratings yet
Literature's View on Professions
5 pages
Oxolinic Acid
No ratings yet
Oxolinic Acid
2 pages

Machine Learning Shortnote

Uploaded by

Machine Learning Shortnote

Uploaded by

Lecture 01

Cognitive Science ~ Wikipedia

Intelligence and Learning

Machine Learning Algorithms

• How to decide parameters 𝜃0 and 𝜃1 ?

Input pairs of data

Hypothesis and Cost Function

Gradient Descent Vs Normal Eq.

Artificial Neural Networks

ANN - Training Complexity

You might also like