0% found this document useful (0 votes)

63 views106 pages

Chapter One1

This document discusses statistical digital signal processing and machine learning. It covers basic concepts of signals and patterns, recognition systems, machine learning definitions and types, and application examples.

Uploaded by

fik55shu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views106 pages

Chapter One1

Uploaded by

fik55shu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 106

1.

Introduction

Statistical Digital Signal Processing and Machine

Learning

2/23/2024 1
Content
❖ Basic Concepts

❖ Recognition Systems

❖ What is Machine Learning?

❖ Density Estimation, Regression and Interpolation

❖ Application Examples

2/23/2024 2
1. Basic Concepts
❖ Signal: a varying quantity that carries information about a physical
phenomenon/process under analysis.

❖ Monodimensional Signal: function f(ξ) representing the information

evolution with respect to an independent variable ξ which translates a physical
reality such as time, frequency, pressure, etc… Ex speech signal

❖ Multidimensional Signal: information evolution is simultaneously related

to multiple correlated or uncorrelated physical realities given like g(ξ,γ,δ,….)

2/23/2024 3
1. Basic Concepts
❖ Signal Acquisition: In the process of signal conversion from analog to
digital depicted as follows we have to have means of acquiring physical signals.

❖ Examples of 1-D Signals

❖ Electrocardiographic signal

❖ ECG is acquired using biopotential amplifier

2/23/2024 4
1. Basic Concepts

❖ And this is graphical representation of the sound wave which has been
recorded by microphone.

❖ Speech Signals(Spectrogram)

❖ Time-frequency representation of the speech signal.

❖ Is a tool to study speech sounds (phone).

❖ a visual representation of the range of frequencies of a signal vs time.

2/23/2024 5
1. Basic Concepts
❖ Hidden Markov Models implicitly model spectrograms for speech to
text systems.

❖ Useful for evaluation of text to speech systems.

❖ A high quality text to speech system should produce synthesized

speech whose spectrograms should nearly match with the natural
sentences.

2/23/2024 6
1. Basic Concepts
❖ Examples of 2-D Signals

❖ For grayscale images(8 bit), the pixel values range from 0 to 255.
2/23/2024 7
1. Basic Concepts
❖ Examples of n-D Signals

2/23/2024 8
1. Basic Concepts
❖ Pattern: a form, template, or model (or, more abstractly, a set of rules)
which can be used to make or to generate objects or parts of an object.

❖ Crystal patterns are represented by 2D/3D structures which can be

described through deterministic rules.

2/23/2024 9
1. Basic Concepts
❖ Examples of Crystal Pattern

2/23/2024 10
1. Basic Concepts
❖ Examples of Texture Pattern

2/23/2024 11
1. Basic Concepts
❖ Examples of ECG Pattern

2/23/2024 12
2. Recognition Systems
❖ Pattern recognition aims at classifying data (patterns) based on a priori
knowledge and statistical information extracted from the patterns.

❖ The patterns to be classified are usually groups of measurements, observations

or features, defining points in an appropriate multidimensional space.

❖ Examples of recognition systems: Speech Systems, Optical Character

Recognition (OCR), biometric systems, biomedical monitoring devices,
change detector in sequence of temporal images, etc…

2/23/2024 13
2. Recognition Systems
❖ Example: Geometric Form Recognition

❖ In this case, the pattern is related to the form of the object.

2/23/2024 14
2. Recognition Systems
❖ Example: Radar Detection

❖ The pattern is an intrinsic structure of the received signal, which allows to

infer the presence/absence of a target in it.

2/23/2024 15
2. Recognition Systems
❖ Recognition System: Block Scheme

2/23/2024 16
2. Recognition Systems
❖ Recognition System: Design Phases

2/23/2024 17
2. Recognition Systems
❖ Example: Automatic Fish-Packing Plant

2/23/2024 18
2. Recognition Systems
❖ Pre-processing: apply a segmentation operation in order to isolate fishes
from one another and from the background.

❖ Feature extraction: measure some features or properties from the image

which will help in discriminating the two species of fish considered (e.g., fish
length and width).

❖ Classification: evaluate the evidence presented and make a final decision

as to the species.

2/23/2024 19
2. Recognition Systems
❖ Multisensor Recognition System

❖ In numerous applications, a recognition system needs to analyze complex

phenomena.

❖ For increasing the probability of success of the system, often, one relies on
the acquisition of information from different sensors (sources).

❖ In these scenarios, the system should be capable to fuse conveniently the

multisensor data in order to optimize its performance.

❖ Fusion may happen at two main different levels, namely data/feature and
decision levels, each raising its own methodological problematics.

2/23/2024 20
2. Recognition Systems
❖ Feature-Level Fusion

2/23/2024 21
2. Recognition Systems
❖ Decision-Level Fusion

2/23/2024 22
3. What is Machine Learning?
❖ “Learning is any process by which a system improves performance from
experience.” - Herbert Simon

❖ Definition by Tom Mitchell (1998):

❖ Machine Learning is the study of algorithms that

❖ improve their performance P

❖ at some task T

❖ with experience E.

❖ A well-defined learning task is given by <P, T, E>.

2/23/2024 23
3. What is Machine Learning?
❖ Machine learning is concerned with the question of how to construct
computer programs that automatically improve with experience.

Traditional Programing vs Machine Learning

Data
Computer Output
Program

Data
Computer Program
Output
2/23/2024 24
3. What is Machine Learning?
❖ A robot driving learning problem:

❖ T: driving on public four-lane highways using vision sensors

❖ P: average distance traveled before an error

❖ E: a sequence of images and steering commands recorded while

observing a human driver

❖ Hand-written words recognizing learning problem

❖ T: Recognizing hand-written words

❖ P: Percentage of words correctly classified

❖ E: Database of human-labeled images of handwritten words

2/23/2024 25
3. What is Machine Learning?
❖ Example:- Suppose your email program watches which emails you do
or do not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?

A. Classifying emails as spam or not spam

B. Watching you label emails as spam or not spam

C. The number of emails correctly classified as spar or not spam

D. None of the above, this is not a machine learning problem.

2/23/2024 26
3. 1 When Do We Use Machine Learning?
❖ ML is used when:

❖ Human expertise does not exist (navigating on Mars)

❖ Humans can’t explain their expertise (speech recognition)

❖ Models must be customized (personalized medicine)

❖ Models are based on huge amounts of data (genomics)

❖ Learning isn’t always useful:

• There is no need to “learn” to calculate payroll
2/23/2024 27
3.1 When Do We Use Machine Learning?
❖ A classic example of a task that requires machine learning: It is very
hard to say what makes a 2

2/23/2024 28
3.1 When Do We Use Machine Learning?
❖ Some more examples of tasks that are best solved by using a learning
algorithm
❖ Recognizing patterns:
❖ Facial identities or facial expressions
❖ Handwritten or spoken words
❖ Medical images
❖ Generating patterns:
❖ Generating images or motion sequences
❖ Recognizing anomalies:
❖ Unusual credit card transactions
❖ Unusual patterns of sensor readings in a nuclear power plant
❖ Prediction:
❖ Future stock prices or currency exchange rates
2/23/2024 29
3.2 Areas of Applications
❖ Web search
❖ Computational electromagnetics
❖ Wireless Communication: Modulation Recognition, Channel State
Information Prediction for 5G Wireless Communications
❖ Finance
❖ E-commerce
❖ Space exploration
❖ Robotics
❖ Information extraction / signal processing
❖ Social networks
❖ Communication systems
❖ This days machine learning is involving in every areas.

2/23/2024 30
3. 3 Types of Learning
❖ Supervised (inductive) learning

❖ Given: training data + desired outputs (labels)

❖ Unsupervised learning

❖ Given: training data (without desired outputs)

❖ Semi-supervised learning

❖ Given: training data + a few desired outputs

❖ Reinforced learning

❖ Rewards from sequence of actions

2/23/2024 31
Supervised Learning: Regression
❖ is the most common type of machine learning

❖ Given (x1, y1), (x2, y2), ..., (xn, yn)

❖ Learn a function f(x) to predict y given x

❖ y is real-valued == regression
9
September Arctic Sea Ice Extent

8
7
(1,000,000 sq km)

6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020

2/23/2024 32
Supervised Learning: Regression
❖ Given (x1, y1), (x2, y2), ..., (xn, yn)

❖ Learn a function f(x) to predict y given x

❖ y is categorical == classification
❖ Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size
Predict Benign Predict Malignant

Tumor Size
2/23/2024 33
Supervised Learning: Regression
❖ x can be multi-dimensional

❖ Each dimension corresponds to an attribute

- ClumpThickness
- Uniformity of Cell Size
Age
- Uniformity of Cell Shape
…

Tumor Size

2/23/2024 34
Unsupervised Learning
❖ Given x1, x2, ..., xn(without labels)

❖ Output hidden structure behind the x’s

❖ Example clustering

2/23/2024 35
Unsupervised Learning

Organize computing clusters

Social network analysis

Market segmentation Astronomical data analysis

2/23/2024 36
Unsupervised Learning
❖ Independent component analysis – separate a combined signal into its
original sources

2/23/2024 37
Unsupervised Learning
❖ Clustering algorithm can used in different application sectors like Google
search engine, individual identification using genes, organize computing
clusters, social network analysis, market segmentation and astronomical data
analysis.

❖ In large data center such algorithm can be used to figure out which machines
turns to work together to maximize efficiency of the center.

❖ Many business centers have huge database of customer information. Given

this customer data set the algorithm is set to group automatically customers
into different market segments so that one can automatically and more
efficiently sell to a different market segments.

2/23/2024 38
Answer the following

2/23/2024 39
4. Density Estimation, Regression and Interpolation
❖ Statistical classifiers require implicitly or explicitly as first step of the
learning process the estimation of the density that a member of a certain
class will be found to express particular features.

❖ Whereas the outputs for classification are discrete class labels, regression is
concerned with the prediction of continuous quantities.

❖ Regression systems seek to find some functional description of data,

often with the goal of predicting values for new input.

❖ Interpolation can be seen as a particular case of regression since it aims at

inferring a prediction function over specific ranges of input.

2/23/2024 40
5. Application Examples
❖ Remote sensing (e.g., generation of thematic and change maps,
environmental risk assessment)

❖ Target recognition in radar and sonar signals;

❖ Optimal receivers for telecommunication systems;

❖ Industrial applications (e.g., automatic product quality control, testing

and diagnosis systems for industrial machineries);

❖ Speech recognition (e.g., call-centers);

❖ Optical character recognition (OCR);

2/23/2024 41
5. Application Examples
❖ Biomedical signal analysis (e.g., support to diagnosis and monitoring,
telemedicine);

❖ Biometry (person authentication/identification based on digital fingerprint,

iris analysis…);

❖ Video-surveillance of public and private environments (e.g., airports,

stadiums, parking);

❖ Robotics (computer vision);

❖ Bioinformatics (DNA and microarray analysis).

2/23/2024 42
5. Application Examples
❖ Example: Video-Surveillance

❖ An automatic recognition system can support an agent by detecting

(early) warnings due to the presence of suspect objects or panic situations
through acoustic and radiometric sensors put in different positions.

2/23/2024 43
5. Application Examples

❖ Example: Video-Surveillance in Tourism Management

2/23/2024 44
5. Application Examples
❖ Example: Biomedical Monitoring

2/23/2024 45
5. Application Examples
❖ Example: Cardiac pathology Detection

2/23/2024 46
5. Application Examples
❖ Example: Biometry

2/23/2024 47
5. Application Examples
❖ Example: Sonar Applications

2/23/2024 48
5. Application Examples
❖ Example: Intelligent Transportation

2/23/2024 49
5. Application Examples
❖ Example: Ground Penetrating Radar

2/23/2024 50
References

❖ Farid MELGANI, “Lecture note on Recognition Systems”,

University of Trento

2/23/2024 51
2. Regression and Regularization

2/23/2024 52
Contents
❖ Linear Regression with One Variable

❖ Linear Regression with Multiple Variable

❖ Logistic Regression

❖ Regularization
❖ Regularized Linear Regression

❖ Regularized Logistic Regression

2/23/2024 53
2.1 Linear Regression with One Variable
❖ Housing price prediction- case of Bdr ❖ Training set of housing prices
Size in feet2 (x) Price($) in 1000’s (y)
2104 460
280k
1416 232
1534 315
852 178
… …

1500ft2
❖ Supervised learning: given the ‘right answer’ for each example in the data,
predicting real valued o/p.
❖ Such a problem is called regression problem.

2/23/2024 54
2.1 Linear Regression with One Variable
❖ Let
❖ m = number of training examples
❖ x’s = “input” variable/ features
❖ y’s = “output” variable / “target” variable

❖ We can represent with block as follows:

Training set

Learning Algorithm

size of house h Estimated price

(x) (estimated value of y)

❖ h- hypothesis maps from x’s to y’s.

2/23/2024 55
2.1 Linear Regression with One Variable
❖ How do we represent h?
y
4
3.5
3 h(x) = 0 + 1x
2.5
2 ❖ h(x) or simply h(x)
1.5
given above is linear
1
regression with one
0.5
0
variable or univariant
0 1 2 3 4 x linear regression

❖ Hypothesis: h(x) = 0 + 1x

❖ Parameters: i’s

❖ How to choose i’s?

2/23/2024 56
2.1 Linear Regression with One Variable
❖ The idea is choose 0, 1 so that h(x) is close to y for training examples (x,
y).

❖ Fit model by minimizing sum of squared errors

y
4.5 𝑚
4 1 2
min ෍ ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖
3.5
𝜃0 ,𝜃1 2𝑚
3 𝑖=1
(xi, yi)
𝑥𝑖 = 𝜃0 + 𝜃1 𝑥 𝑖
2.5
2
Where ℎ𝜃
1.5 𝑚
1 2
෍ ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖
1
𝐽 𝜃0 , 𝜃1 =
0.5
2𝑚
0 𝑖=1
0 1 2 3 4
x ⇒ min 𝐽 𝜃0 , 𝜃1
𝜃0 ,𝜃1
𝐽 𝜃0 , 𝜃1 is cost function or squared error function

2/23/2024 57
2.1 Linear Regression with One Variable
❖ Hence, for linear regression with one variable

❖ Hypothesis : h(x) = 0 + 1x

❖ Parameters: 0 , 1

1 2
❖ Cost Function: 𝐽 𝜃0 , 𝜃1 = σ𝑚 ℎ𝜃 𝑥𝑖 −𝑦 𝑖
2𝑚 𝑖=1

❖ Goal: min 𝐽 𝜃0 , 𝜃1
𝜃0 ,𝜃1

2/23/2024 58
2.1 Linear Regression with One Variable
❖ Let’s assume x  R so = [0, 1], then for given data h(x) and J(1) for
0 = 0:
J(1)
h(x)
(function of the parameter 1)
(for fixed 1 , this is function of x

2/23/2024 59
2.1 Linear Regression with One Variable
h(x) J(1)
(for fixed 1 , this is function of x (function of the parameter 1)

1 2 2
𝐽 0,1 = [ 0.5 − 1 + 1−2 + 1.5 − 3 2 ] ≈ 0.58
2𝑥3

2/23/2024 60
2.1 Linear Regression with One Variable
h(x) J(1)
(for fixed 1 , this is function of x (function of the parameter 1)

1 2 2
𝐽 0,0 = [ 0 −1 + 0−2 + 0 − 3 2 ] ≈ 2.333
2𝑥3

2/23/2024 61
2.1 Linear Regression with One Variable
❖ If we vary both 𝜃0 𝑎𝑛𝑑 𝜃1 , then if we plot cost function vs 𝜃0 𝑎𝑛𝑑 𝜃1

2/23/2024 62
2.1 Linear Regression with One Variable
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

2/23/2024 63
2.1 Linear Regression with One Variable
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

2/23/2024 64
2.1 Linear Regression with One Variable
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

2/23/2024 65
2.1 Linear Regression with One Variable
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

2/23/2024 66
a. Basic Search Procedure
❖ Choose initial values for 
❖ Until we reach a minimum:
❖ Choose a new value for to reduce J()

J(  )



❖ Since the least squares objective function is convex (concave), we
don’t need to worry about local minima
2/23/2024 67
Gradient Descent
❖ Plotting and manually trying to find values of θ0 and θ1 that minimizes cost
function J(θ0, θ1) is very difficult.

❖ The most common algorithm that can be used to find value of θ0 and θ1 that
minimizes cost function J(θ0, θ1) automatically is gradient descent algorithm.
❖ Steps
❖ Initialize θ
❖ Repeat until convergence
{
𝜕
𝜃𝑗 : = 𝜃𝑗 − 𝛼 𝐽(𝜽) for j = 0 and j = 1
𝜕𝜃𝑗
}
Learning rate, it controls how big a step we
take down hill with gradient descent, 𝛼 = 0.05
2/23/2024 68
Gradient Descent
❖ Correct simultaneous update
𝜕
𝑡𝑒𝑚𝑝0 ∶= 𝜃0 − 𝛼 𝐽(𝜃0 , 𝜃1 )
𝜕𝜃0
𝜕
𝑡𝑒𝑚𝑝1 ∶= 𝜃1 − 𝛼 𝐽(𝜃0 , 𝜃1 )
𝜕𝜃1
𝜃0 ≔ 𝑡𝑒𝑚𝑝0
𝜃1 ≔ 𝑡𝑒𝑚𝑝1
❖ Lets choose θ1 on the graph and update as follows.

J(1) for 1  R

1

2/23/2024 69
Gradient Descent
❖ And the derivative of J(θ1) at θ1 means the slope of the line that
passes through θ1.

❖ The slope tangent at θ1 has a positive value.

𝜕
θ1 ≔ θ1 − α 𝐽(𝜃1 )
J(1) 𝜕𝜃𝑗
θ1 ≔ θ1 − α . (𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑢𝑚𝑏𝑒𝑟)

1

2/23/2024 70
Gradient Descent
❖ Lets reconsider our previous graph picking a different θ1

❖ The slope tangent through current θ1 has a negative value.

𝜕
𝐽(𝜽𝟏 ) ≤𝟎
𝜕𝜃𝑗
J(1)
θ1 ≔ θ1 − α. (negative number)

1

2/23/2024 71
Gradient Descent
𝜕
𝜃1 ∶= 𝜃1 − 𝛼 𝐽(𝜃1 )
𝜕𝜃1

If  is too small, gradient descent can

be slow

If  is too large, gradient descent can

overshoot the minimum. It may fail
to converge, or even diverge.

2/23/2024 72
Gradient Descent
❖ Lets say you initialize θ1 at local optimum

❖ At that point the derivative term will be zero thus if you are at the local optimum
leave θ1 unchanged.
𝜃1 ≔ 𝜃1 − 𝛼. 0
𝜃1 : = 𝜃1
2/23/2024 73
Gradient Descent
❖ Gradient descent can converge to a local minimum, even with a learning rate α
fixed.

❖ At that point the derivative term will be zero thus if you are at the local optimum
leave θ1 unchanged.

2/23/2024 74
Gradient Descent
❖ Now lets apply the gradient descent algorithm to the linear
regression model. 𝑚
1 2
𝐽 𝜃0 , 𝜃1 = ෍ ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖
2𝑚
𝑖=1
❖ Then 𝑚
𝜕 𝜕 1 2
𝐽 𝜃0 , 𝜃1 = . ෍ ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖
𝜕𝜃𝑗 𝜕𝜃𝑗 2𝑚
𝑖=1

❖ For ℎ𝜃 𝑥 𝑖 = 𝜃0 + 𝜃1 𝑥 𝑖
𝑚
𝜕 𝜕 1 2
𝐽 𝜃0 , 𝜃1 = . ෍ 𝜃0 + 𝜃1 𝑥 𝑖 − 𝑦 𝑖
𝜕𝜃𝑗 𝜕𝜃𝑗 2𝑚
𝑖=1

2/23/2024 75
Gradient Descent
❖ Then 𝑚
𝜕 1
𝑗 = 0: 𝐽 𝜃0 , 𝜃1 = ෍(ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) )
𝜕𝜃0 𝑚
𝑖=1
𝑚
𝜕 1
𝑗 = 1: 𝐽 𝜃0 , 𝜃1 = ෍(ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) ) . 𝑥 (𝑖)
𝜕𝜃1 𝑚
𝑖=1
❖ Thus the gradient descent becomes
repeat until convergence{
1 𝑚
𝜃0 : = 𝜃0 − 𝛼 σ𝑖=1(ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) )
𝑚
1
𝜃1 : = 𝜃1 − 𝛼 σ𝑚 (ℎ 𝑥 (𝑖) − 𝑦 𝑖 ). 𝑥 𝑖
𝑚 𝑖=1 𝜃
}
❖ Here do not forget to update θ0 and θ1 simultaneously.

2/23/2024 76
Gradient Descent
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

h(x) = -900 – 0.1 x

2/23/2024 77
Gradient Descent
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

2/23/2024 78
Gradient Descent
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

2/23/2024 79
Gradient Descent
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

2/23/2024 80
Gradient Descent
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

2/23/2024 81
Gradient Descent
h(x) J(0, 1)
(for fixed 0 and 1 , this is function of x (function of the parameter 0, 1)

2/23/2024 82
Gradient Descent
❖ To see if gradient descent is working, print out J() each iteration
❖ The value should decrease at each iteration

❖ If it doesn’t, adjust α
Linear Regression With Multiple Variables/Features
❖ Previously we discussed linear regression with one variable (size of
housing) to predict the price of the house(y).

❖ But size of the house is not the only feature to predict the price but
also the number of bedrooms, the number of floors and the age of
home(years).
2/23/2024 83
Linear Regression With Multiple Variables/Features
❖ Lets denote features as X1, X2, X3 and X4 and the output y as shown
below.

❖ Having this lets introduce some more notations.

2/23/2024 84
Linear Regression With Multiple Variables/Features
❖ From the above example we considered

❖ n=4

❖ 𝑥 (1) = 1416 3 2 40 , and here we can see that 𝑥 (1) is n

dimensional vector
(𝑖)
❖ A specific feature value in a training set denoted as 𝑥𝑗 and for
(1)
example in our above example 𝑥3 = 2.

❖ And for multiple variable case specifically above example,

hypothesis is given by:

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + 𝜃3 𝑥3 + 𝜃4 𝑥4
2/23/2024 85
Linear Regression With Multiple Variables/Features
❖ Here remember that a hypothesis is trying to predict price of the
house in thousands of dollars.

❖ For example we can take the following as a sample hypothesis

ℎ𝜃 𝑥 = 80 + 0.1𝑥1 + 0.01𝑥2 + 3𝑥3 − 2𝑥4

❖ We can generalize expression of hypothesis for n features as

follows.
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + ⋯ + 𝜃𝑛 𝑥𝑛

❖ To simplify further and for convenience of notation, lets define

𝑥0 = 1.

❖ And thus our feature vector x become n+1 dimensioned vector.

2/23/2024 86
Linear Regression With Multiple Variables/Features

❖ And hypothesis can be written as:

ℎ𝜃 𝑥 = 𝜃0 𝑥0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + ⋯ + 𝜃𝑛 𝑥𝑛
= 𝜃𝑇𝑥

❖ This is the form of hypothesis when we have multiple features and

also called multivariate linear regression.
2/23/2024 87
Gradient Descente for Multiple Variable
❖ Here we will see how to fit parameters to a given data set applying
gradient descent.

❖ For linear regression with multiple feature we have the

❖ Hypothesis: ℎ𝜃 𝑥 = 𝜃 𝑇 𝑥 = 𝜃0 𝑥0 + 𝜃1 𝑥1 + 𝜃2 𝑥2 + ⋯ + 𝜃𝑛 𝑥𝑛

❖ Parameters: 𝜃0 , 𝜃1 , 𝜃2 … 𝜃𝑛

❖ Cost function:
𝑚
1 2
𝐽 𝜃0 , 𝜃1 , 𝜃2 … 𝜃𝑛 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
2𝑚
𝑖=1

2/23/2024 88
Gradient Descente for Multiple Variable
❖ Gradient descent for multiple variable is:
Repeat {
𝜕
θj ≔ θj − α 𝐽(𝜃0 , 𝜃1 , 𝜃2 … 𝜃𝑛 )
𝜕𝜃𝑗
} simultaneously update for every j = 0, …, n
❖ After taking partial derivative we will find the following algorithm.
Repeat {
𝑚
1
θj ≔ θj − α ෍(ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) )𝑥𝑗𝑖
𝑚
𝑖=1
(simultaneously update j for j = 0,…,n)
}

2/23/2024 89
Gradient Descente for Multiple Variable
❖ If we substitute, we will have following expressions for each
parameters.
1 𝑚
θ0 ≔ θ0 − α σ (ℎ 𝑥 (𝑖) − 𝑦 (𝑖) )𝑥0𝑖
𝑚 𝑖=1 𝜃
𝑚
1
θ1 ≔ θ1 − α ෍(ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) )𝑥1𝑖
𝑚
𝑖=1
𝑚
1
θ2 ≔ θ2 − α ෍(ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) )𝑥2𝑖
𝑚
𝑖=1
….
𝑚
1
θn ≔ θ𝑛 − α ෍(ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) )𝑥𝑛𝑖
𝑚
𝑖=1

2/23/2024 90
Gradient Descente for Multiple Variable
❖ To make sure that gradient descent is working correctly, we can draw J(θ) as
gradient descent runs that is J(θ) vs number of iteration.
❖ And thus a gradient descent is working properly if J(θ) decreases after every
iteration.
❖ Typical graph of J(θ) vs number of iterations is given below.

2/23/2024 91
Gradient Descente for Multiple Variable
❖ As we can see from the graph 300 to 400 is flat and thus it has less chance of
converge.
❖ So by looking at the graph we can judge weather gradient descent converges or not.
❖ There is also automatic way of deciding if gradient descent can converge or not.
❖ Declare convergence if J(θ) decreases by less than 0.001 in one iteration.
❖ In other cases, there might be gradient descent where function plot gives graphs like
given below.
❖ During these times to make the gradient descent converge we can choose smaller
learning rate.

2/23/2024 92
Gradient Descente for Multiple Variable

❖ Note that for sufficiently small α, J(θ) should decrease on every iteration but if α is
too small gradient descent can be slow to converge.

2/23/2024 93
Improving Learning: Feature Scaling
❖ The idea is make sure that features are on a similar scale.
❖ For example if we have a problem with feature values:
x1 = size(0-2000feet2)
x2 = number of bedrooms(0-5)
❖ And in such problem if we plot the cost function, we will get tall and skinny
contour and gradient descent will take long time to converge to minima of
the cost function.
❖ When we face such situation we can scale features as follows.
𝑠𝑖𝑧𝑒 𝑓𝑒𝑒𝑡 2 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑒𝑑𝑟𝑜𝑜𝑚𝑠
𝑥1 = and 𝑥2 =
2000 5

2/23/2024 94
Improving Learning: Feature Scaling
❖ After feature scaling cost function(contour plot) will much less skinny as
given below gradient descent will converge much faster.
20 20
15 15
2 10
2 10
5 5
0 0
0 5 10 15 20 0 5 10 15 20
1 1
❖ Feature scaling we brought range of features 0 ≤ 𝑥1 ≤ 1 𝑎𝑛𝑑 0 ≤ 𝑥2 ≤ 1
which enables gradient descent to converge much faster.

2/23/2024 95
Improving Learning: Feature Scaling
❖ Here it does not mean it must be in a −1 ≤ 𝑥𝑖 ≤ 1 range, for example we
can have:
❖ 0 ≤ 𝑥1 ≤ 3,
❖ −2 ≤ 𝑥2 ≤ 0.5,
❖ −100 ≤ 𝑥3 ≤ 100 𝑛𝑜𝑡 𝑝𝑟𝑜𝑝𝑒𝑟 𝑟𝑎𝑛𝑔𝑒 𝑎𝑠 100 𝑖𝑠 𝑓𝑎𝑟 𝑓𝑟𝑜𝑚 1.
❖ −0.0001 ≤ 𝑥4 ≤ 0.0001poorly scaled and
❖ From here we have to note that our features may not be in the same range of
values as far as they are close to −1 ≤ 𝑥𝑖 ≤ 1 range.

2/23/2024 96
Normal Equation
❖ Here we will see normal equation which for some linear regression problems will
give us much better way to solve the optimum value of parameter θ.
❖ Gradient descent needs a number of iterations to reach the optimum value where as
normal equation which is analytical method it takes one step to get optimum value.
❖ We know how to determine the minimum value from our calculus and here also the
same principle will be applied.

2/23/2024 97
Normal Equation
❖ Example m=4

❖ To apply normal equation take these data sets and add on extra
column for x0 and then we will find:

2/23/2024 98
Normal Equation

❖ Next construct a matrix X which contains all features and a vector y form outputs.

❖ Where X is m x (n+1) matrix and y m-dimensional column vector.

❖ Finally you take X transpose and multiply by X then the whole inverse multiplied
by X transpose by y equate to θ and solve for θ will give values of θ that minimizes
cost function.

2/23/2024 99
Normal Equation

❖ To generalize for m number of data set (𝑥 1 , 𝑦1 ), … , (𝑥 𝑚 , 𝑦 𝑚 ) and n number of

features

❖ Then matrix X called the design matrix will be:

2/23/2024 100
Normal Equation

❖ Then matrix X called the design matrix will be:

(𝑥 (1) )𝑇
(𝑥 (2) )𝑇
𝑋 = (𝑥 (3) )𝑇
…
(𝑥 𝑚 )𝑇

❖ And thus after setting each we can evaluate the following equation.

❖ Inverse and transpose of a matrix can be implemented on matlab/octave and it is as

follows:
❖ We used feature scaling for gradient descent method but not necessary in normal
equation methods.
2/23/2024 101
Normal Equation

❖ Lets see advantages and disadvantages of gradient descent and normal equation
methods.

❖ Normal equation method may be feasible for n not more than thousands, but if
higher better to go for gradient descent.

2/23/2024 102
Normal Equation
❖ Normal Equation and Non-invertibility

❖ There are two conditions which cause non-invertibility to occur.

2/23/2024 103
Linear Regression With Multiple Variables/Features

❖Quiz
1. Suppose you are working on weather prediction, and your
weather station makes one of three predictions for each day's
weather: Sunny, Cloudy or Rainy. You'd like to use a learning
algorithm to predict tomorrow's weather.
Would you treat this as a classification or a regression
problem?
a. Regression
b. Classification

2/23/2024 104
Linear Regression With Multiple Variables/Features

❖Quiz
2. Suppose you have 14 training examples with 3 features.
The normal equation is θ=(XTX)−1XTy. For the given values
of m and n, what are the dimensions of θ, X, and y in this
equation?
a. X is 14×4, y is 14×1, θ is 4×1
b. X is 14×3, y is 14×1, θ is 3×3
c. X is 14×3, y is 14×1, θ is 3×1
d. X is 14×4, y is 14×4, θ is 4×4

2/23/2024 105
Linear Regression With Multiple Variables/Features

❖Quiz
3. You run gradient descent for 15 iterations with α=0.3 and
compute J(θ) after each iteration. You find that the value of
J(θ) increases over time. Based on this, which of the
following conclusions seems most plausible?
a. Rather than use the current value of α, it'd be more
promising to try a larger value of α (say α=1.0).
b. α=0.3 is an effective choice of learning rate.
c. Rather than use the current value of α, it'd be more
promising to try a smaller value of α (say α=0.1).

2/23/2024 106

ML Merged
No ratings yet
ML Merged
433 pages
ML - Unit I - Final
No ratings yet
ML - Unit I - Final
132 pages
ML Module 1 Final
No ratings yet
ML Module 1 Final
134 pages
ML - 1 - Sovan - Introduction To ML
No ratings yet
ML - 1 - Sovan - Introduction To ML
83 pages
Introduction To Machine Learning: WWW - Seas.upenn - Edu/ Cis519
100% (1)
Introduction To Machine Learning: WWW - Seas.upenn - Edu/ Cis519
51 pages
01-Introduction To Machine Learning
No ratings yet
01-Introduction To Machine Learning
89 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
119 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
Introduction To Course Lecture - 2024new
No ratings yet
Introduction To Course Lecture - 2024new
93 pages
Chapter-8 (Cluster Analysis Basic Concepts and Algorithms)
No ratings yet
Chapter-8 (Cluster Analysis Basic Concepts and Algorithms)
73 pages
SEng5305-chap-1-Introduction To ML
No ratings yet
SEng5305-chap-1-Introduction To ML
85 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
ML Notes
No ratings yet
ML Notes
101 pages
MCA - ML Question Bank Answer
No ratings yet
MCA - ML Question Bank Answer
139 pages
Lec 7 - 8 - Machine Learning Introduction
No ratings yet
Lec 7 - 8 - Machine Learning Introduction
55 pages
01 Introduction ML
No ratings yet
01 Introduction ML
48 pages
ML Module 1
No ratings yet
ML Module 1
52 pages
MLUnit 1
No ratings yet
MLUnit 1
131 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
89 pages
Lecture 1.2 Introduction To Machine Learning
No ratings yet
Lecture 1.2 Introduction To Machine Learning
31 pages
Intro To ML - 1
No ratings yet
Intro To ML - 1
29 pages
Module 1-Basics of ML
No ratings yet
Module 1-Basics of ML
142 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
48 pages
1.0 Introduction
No ratings yet
1.0 Introduction
50 pages
Military AI-Week 02-Key Concept Machine Learning
No ratings yet
Military AI-Week 02-Key Concept Machine Learning
84 pages
Clustering
No ratings yet
Clustering
57 pages
ML Chapter 01
No ratings yet
ML Chapter 01
38 pages
Research Paper Minor Project Final
No ratings yet
Research Paper Minor Project Final
7 pages
01 - ML - Introduction
No ratings yet
01 - ML - Introduction
65 pages
ML QB 1,2,3
No ratings yet
ML QB 1,2,3
60 pages
Introduction To Machine Learning: Methods, Applications, Etc
No ratings yet
Introduction To Machine Learning: Methods, Applications, Etc
29 pages
Medical Image Computing (Cap 5937)
100% (1)
Medical Image Computing (Cap 5937)
78 pages
Ch7 Introduction To Machine Learning
No ratings yet
Ch7 Introduction To Machine Learning
29 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
63 pages
Lecture 1 - Intro
No ratings yet
Lecture 1 - Intro
63 pages
Unit 1
No ratings yet
Unit 1
72 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Module1 Introduction
No ratings yet
Module1 Introduction
35 pages
1c Machinelearning
No ratings yet
1c Machinelearning
50 pages
A Capsule Network-Enhanced Unet Architecture
No ratings yet
A Capsule Network-Enhanced Unet Architecture
13 pages
Lecture 1
No ratings yet
Lecture 1
43 pages
Day 2 Part 1
No ratings yet
Day 2 Part 1
52 pages
Hindawi 2
No ratings yet
Hindawi 2
20 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
AA2 Intro ML 2024
No ratings yet
AA2 Intro ML 2024
35 pages
Chapter 7 - Artificial Intelligence Application
No ratings yet
Chapter 7 - Artificial Intelligence Application
29 pages
Unit 1&2
No ratings yet
Unit 1&2
270 pages
Topic-Automatic Sorting Machine EL 2ND SEM
No ratings yet
Topic-Automatic Sorting Machine EL 2ND SEM
16 pages
171-15-9191 (20%)
No ratings yet
171-15-9191 (20%)
38 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Mlintro 2
No ratings yet
Mlintro 2
28 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
Unit 3
No ratings yet
Unit 3
62 pages
ML Overview
No ratings yet
ML Overview
26 pages
01 Introduction 1
No ratings yet
01 Introduction 1
71 pages
A Semiautomated Deep Learning Approach For Pancreas Segmentation
No ratings yet
A Semiautomated Deep Learning Approach For Pancreas Segmentation
21 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
32 pages
Professional Electives Semester Ii, Elective I
No ratings yet
Professional Electives Semester Ii, Elective I
16 pages
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
No ratings yet
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
106 pages
01 LecIntro
No ratings yet
01 LecIntro
23 pages
JARVIS Project
No ratings yet
JARVIS Project
39 pages
DIP 15EC72 Notes
100% (11)
DIP 15EC72 Notes
204 pages
LM #01-Introduction To ML
No ratings yet
LM #01-Introduction To ML
33 pages
01 Introduction
No ratings yet
01 Introduction
43 pages
Project Requirement Specifications
No ratings yet
Project Requirement Specifications
14 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
An Automatic Counting System For Transparent Pelagic
No ratings yet
An Automatic Counting System For Transparent Pelagic
6 pages
Pattern Recognition With Semi-Supervised Learning Algorithm
No ratings yet
Pattern Recognition With Semi-Supervised Learning Algorithm
57 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
Kidneyreport 1.22 No Use and Sequence
No ratings yet
Kidneyreport 1.22 No Use and Sequence
30 pages
CEN454 - Computer Vision and Machine Learning (Current)
No ratings yet
CEN454 - Computer Vision and Machine Learning (Current)
6 pages
IT6005-Digital Image Processing
No ratings yet
IT6005-Digital Image Processing
12 pages
CS4670 Final Report Hand Gesture Detection and Recognition For Human-Computer Interaction
No ratings yet
CS4670 Final Report Hand Gesture Detection and Recognition For Human-Computer Interaction
27 pages
Short Dicription About Papaya
No ratings yet
Short Dicription About Papaya
3 pages
Rice Grains Classification Using Image Processing Technics: December 2016
No ratings yet
Rice Grains Classification Using Image Processing Technics: December 2016
7 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Image Processing MCQ Lecture Notes 1
No ratings yet
Image Processing MCQ Lecture Notes 1
55 pages
Theft Vehicle Detection Using Automatic License: Plate Recognition
No ratings yet
Theft Vehicle Detection Using Automatic License: Plate Recognition
5 pages
Detection of Acid-Fast Bacilli (AFB) in Slit Skin Smear Microscopy Using MASK R-CNN Algorithm
No ratings yet
Detection of Acid-Fast Bacilli (AFB) in Slit Skin Smear Microscopy Using MASK R-CNN Algorithm
1 page
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
ML 01
No ratings yet
ML 01
15 pages
X2 - Text Recognition PDF
No ratings yet
X2 - Text Recognition PDF
14 pages
(Referensi Quantization) Image Segmentation Using K-Means Color Quantization and Density-Based Spatial Clustering of Applications With Noise (DBSCAN) For Hotspot Detection in Photovoltaic Modules
No ratings yet
(Referensi Quantization) Image Segmentation Using K-Means Color Quantization and Density-Based Spatial Clustering of Applications With Noise (DBSCAN) For Hotspot Detection in Photovoltaic Modules
5 pages
Jahanshahi 2011 - Adaptive Vision-Based Crack Detection
No ratings yet
Jahanshahi 2011 - Adaptive Vision-Based Crack Detection
10 pages
Applications of Image Processing in Agriculture: A Survey: Anup Vibhute S K Bodhe
No ratings yet
Applications of Image Processing in Agriculture: A Survey: Anup Vibhute S K Bodhe
7 pages
The Gait Identification Challenge Problem: Data Sets and Baseline Algorithm
No ratings yet
The Gait Identification Challenge Problem: Data Sets and Baseline Algorithm
4 pages
ARTCom 2012 Accepted List
No ratings yet
ARTCom 2012 Accepted List
13 pages
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet