0% found this document useful (0 votes)
23 views

Machine Learning Shortnote

This document provides an overview of machine learning and cognitive science. It discusses how cognitive science examines the mind and its processes through an interdisciplinary approach. Machine learning is introduced as the study of algorithms that allow computer systems to improve at tasks through experience. Common machine learning algorithms discussed include supervised learning methods like classification and regression as well as unsupervised learning. Linear regression is explained as a technique for fitting linear models to data. The gradient descent algorithm and normal equation methods for learning the parameters of a linear regression model are also introduced.

Uploaded by

lahiru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Machine Learning Shortnote

This document provides an overview of machine learning and cognitive science. It discusses how cognitive science examines the mind and its processes through an interdisciplinary approach. Machine learning is introduced as the study of algorithms that allow computer systems to improve at tasks through experience. Common machine learning algorithms discussed include supervised learning methods like classification and regression as well as unsupervised learning. Linear regression is explained as a technique for fitting linear models to data. The gradient descent algorithm and normal equation methods for learning the parameters of a linear regression model are also introduced.

Uploaded by

lahiru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Lecture 01

Cognitive Science ~ Wikipedia


• Cognitive science is the interdisciplinary,
scientific study of the mind and its
processes. It examines the nature, the
tasks, and the functions of cognition (in a
broad sense).
• Cognitive scientists study intelligence and Machine Learning
behavior, with a focus on how nervous “Machine learning (ML) is the study of
systems represent, process, and transform algorithms and statistical models that
information. Mental faculties of concern to computer systems use to progressively
cognitive scientists include language, improve their performance on a specific
perception, memory, attention, reasoning, task.” ~ Wikipedia
and emotion; to understand these faculties, • Learning in Artificial Context
cognitive scientists borrow from fields such • Data resembles the experience
as linguistics, psychology, artificial •Computational Models to represent
intelligence, philosophy, neuroscience, and Knowledge
anthropology.
Tom Mitchell (1998)
A computer program is said to learn from
experience E with respect to some task T
and some performance measure P, if its
performance on T, as measured by P,
improves with experience E.
Ex.
• E – Emails
• T – Recognizing spams
• P – Accuracy of spam email recognition
“ - Jeff Hawkins - ”
The brain uses vast amounts of memory to Machine Learning Applications
create a model of the world. Everything you • Search Engines – Appropriate results
know and have learned is stored in this • Mail services – Spam filters
model. The brain uses this memory-based • Handwritten character recognition
model to make continuous predictions of • Recommendations
future events. It is the ability to make • Word suggestions in Keyboards
predictions about the future that is the crux • Robots / Autonomous Vehicles
of intelligence. • Natural Language Processing

Intelligence and Learning


• Intelligence is the ability to learn from
experience, solve problems, and use our
knowledge to adapt to new situations.
• Pattern recognition
Machine Learning Algorithms
Two main broader classes
1. Supervised Learning
2. Unsupervised Learning
3. Semi-Supervised
4. Reinforcement Learning

Supervised Learning
Algorithm learns to map a given input to a
Single data item with multiple attributes desired output, so it can map an unforeseen
We call it input to an output.
• Samples • Spam filters
• Data Points • Handwritten character recognition
• Data Vectors • Credit limit
• Data Items
Supervised Learning Problems
• Regression Problems
- Real Valued Output
• Classification Problems
- Categorical / Discrete Output

Classification or Regression ?
• Predicting the salary of a fresh graduate
• Email – Spam or Not
• Tumor – Cancer or Not
•Handwritten character recognition
• Predicting weather – Sunny, Rainy,
Cloudy, Windy

Unsupervised Learning
Algorithm learns without having desired output
for a given input. It just attempts to find out the
structure within the data set (identify similarities
and divisions among data).

Machine Learning Algorithms


• Predictions/ Forecast Semi-supervised Learning
• Anomaly detections •​ Part of the data is with desired output and the
• Clustering rest is without desired output
Lecture 02
Linear Regression
- Univariate Linear Regression

• How to decide parameters 𝜃0 and 𝜃1 ?


• 𝑔 (𝑥) is the final hypothesis where ℎ(𝑥) can
be any hypothesis from the hypothesis set
Understanding the Problem Learning • Intuition – ℎ(x) is close to 𝑦 for training
Approach ? data (x, 𝑦)
(Supervised or Unsupervised) • Find parameters which gives the minimum
• Supervised error 𝐸𝑟𝑟𝑜𝑟 = (ℎ(𝑥) − 𝑦) 2
Learning Regression or Classification ?
• Regression Linear Regression
~ Fitting a Straight Line to the Data ~ • Select 𝜃 0 and 𝜃 1
when Sum of squared error for all the data
Supervised Learning - Training Data points of a particular hypothesis should be
minimized.

Input pairs of data


• 𝑥 – GPA
• 𝑦 – Salary

Hypothesis and Cost Function


• Our hypothesis is a function of 𝑥,
ℎ(𝑥)
• Our cost function is a function of 𝜃0 𝑎𝑛𝑑 𝜃1
,
𝐽(𝜃0,𝜃1)
Linear Functions
• Linear relationship between input and
output
Gradient Descent Algorithm
Learning Process
1. 𝑟𝑒𝑝𝑒𝑎𝑡
• What we want is an algorithmic approach
2. 𝜃​ 𝑗​ ≔ 𝜃​ 𝑗 ​− 𝛼 (𝜕 /𝜕𝜃​𝑗​ )𝐽(𝜃​ 0 ​, 𝜃 1​​ )
to find this 𝜃 0 and 𝜃 1 when the cost is
3. 𝑢𝑛𝑡𝑖𝑙 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒
minimized.
• Applicable for any number of 𝜃 𝑗
• This is what we called learning in this
• 𝛼 - Learning Rate (fractional number -
problem
defines the step size)
Gradient Descent Algorithm
• Start with any 𝜃 0 and 𝜃 1 value
• Repeat till find a minimum
- Change 𝜃 0 and 𝜃 1 in a way that it
decreases 𝐽 𝜃​ 0​ , 𝜃 1​
• Initialization of 𝜃​ 0 ​and 𝜃​ 1
-Random values
-Zeros
Linear Regression
This is a broader class
• Fit a line
• Fit a curve
• Fit a periodic function
It is not only about straight line
• Model is linear in 𝜃 parameters, That’s why
it is called linear regression

Lecture 03
Regression
• A statistical method
• Attempts to determine the strength /
relationships between one dependent
variable (usually denoted by Y) and a series
of independent variables (usually x).

Notation
•𝑥 – input (multivariate data point/ vector)
•𝑑 – number of features (dimensions)
•𝑥​𝑖​ – 𝑖 th data point
•𝑥​𝑖​ 𝑘​ – 𝑘​ th​ feature of 𝑖​ th​ data point

Convergence
• Plot cost function and identify error
minimized and stabled
• Predefined tolerance level for cost change
Normal Equations
• Gradient Descent – Iterative process.
• Normal Equations – ​Solve a mathematical
equation and finds optimal values for
parameters (𝜃​ 𝑘​ ).
• Directly finds the value of 𝜃 without
Gradient Descent iterative process.
• Effective when the data set has less
features.
• Take the partial derivative for each 𝜃 𝑘​​ and
equals it to zero.
• Solve the equation and find 𝜃​ 𝑘​ for each
parameter
Normal Equation Method
• Construct the matrix 𝑋 from the sample
data
• Construct the vector 𝑦 from the available
labels
• Compute the 𝑝𝑠𝑒𝑢𝑑𝑜 𝑖𝑛𝑣𝑒𝑟𝑠𝑒 of 𝑋
• Compute 𝜃 by
𝜃 = (𝑋 𝑇​​ 𝑋) −1​
​ 𝑋 𝑇​​ 𝑦

Gradient Descent Vs Normal Eq.


• Learning rate in Gradient Descent
• Gradient Descent requires a number of
iterations
• Matrix inversion calculation is a
computationally intensive task, so NE
method may be computationally inefficient if
the number of features is relatively large.
• NE method can be used to initialize
parameters for classification counterpart of
linear problems

Logistic Regression
Linear Regression for Classification
• In our linear regression model, the
hypothesis is, 𝑔(𝑥) = 𝜃 𝑇​​ 𝑥
• It gives real valued output for any given
input.
• Linear regression doesn’t suit classification
problems.

Classification
• What we want is a hypothesis that restricts
the output. 0 ≤ ℎ​𝜃​(𝑥) ≤ 1
• 0 − 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝐶𝑙𝑎𝑠𝑠
• 1 − 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝐶𝑙𝑎𝑠𝑠

Logistic Regression
• Uses the linear regression hypothesis ℎ(𝑥)
• Apply restrictive function on 𝑔(𝑥)
• It will be the classification hypothesis 𝑘(𝑥)
𝑘(𝑥)= 𝑔(ℎ(𝑥))
• We use 𝑔 − 𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝐹𝑢𝑛𝑐𝑡𝑖𝑜n
Logistic Function
• Always provides output 0 ≤ 𝑠​𝜃​(𝑥) ≤ 1
• If 𝑠​𝜃​(𝑥) = 0.82
• Can use a threshold 0.5 and
declare it is class 1
• Can state probability of being class
1 is 82%
• Probability of being class 0 is 18%
correct value which you try to predict.
Underfit
• Variance : how much your hypothesis
changes from the target function if the
training data set is changed. ​Overfit

Avoiding Overfitting
• Reduce number of features
• Manual feature reduction
• Model selection algorithm
• Algorithms to select features and
get rid of some features
• Regularization
• No feature reduction

Regularization
• Keep all features
• Reduce magnitude of parameters/ weights
• It has been proven parameters with
reduced magnitude makes a function
Lecture 04 smother
Regularization • Work well with high dimensional data
Underfitting/ Overfitting • Regularization helps when,
• Underfitting – High Bias • Each dimension contribute to the
• Just Fit – Quadratic Function might just fit final output
• Overfitting – High Variance • Not sure the influence of dimension
to the final output
Bias – Variance Tradeoff
• Approximation vs Generalization
• Bias : the difference between the average
prediction of your hypothesis and the
𝜆 in the Equation
• If 𝜆 is very large
• Penalizes all the 𝜃 parameters and
Regularization brings near zero
• When smaller the parameter values, • Hypothesis becomes flat
• Simpler hypothesis • Underfit – High bias
• Less prone to overfitting • If 𝜆 is very small
• With high dimensional data it is difficult to • Almost no influence over 𝜃
identify which parameter should be parameters
penalized. • Hypothesis tend to overfit
• Thus, with regularization we attempt to • High Variance
shrink all the parameters.
ANN - Artificial Neural Networks
Why ?
• Difficulty of modelling complex hypotheses
with linear models.
• Polynomial Curve Fitting difficult with a
large number of features.
Activation Function: Thresholding
• Thresholding makes harsh decision
• Smoother transition is preferred
• Other Activation Functions
• Sigmoid functions • Relu
Note
• Activations of layer N will be the input
vector for the layer N+1.
• Activations of layer N+1 is a function of
activations of layer N.
• Activations of layer N+1 is function of
original input vector.
• (When N = 1, which is a special case and
activation of Layer 1 is simply regarded as
the input vector itself)
Lecture 05

Artificial Neural Networks


• Backpropagation Algorithm

ANN - Training Complexity


•Weight adjustment
• Different Layers
• Error minimization
• Cost
• Cost contribution – Each Layer /
Each Neuron

You might also like