0% found this document useful (0 votes)
27 views53 pages

Lecture 1and2-Revision Part1

The document discusses the fundamentals of machine learning and its application in solving predictive analytics problems, such as spam detection and score prediction. It outlines the process of building a machine learning algorithm, including defining tasks, collecting datasets, extracting features, and selecting models. Additionally, it covers evaluation metrics and optimization methods for improving model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views53 pages

Lecture 1and2-Revision Part1

The document discusses the fundamentals of machine learning and its application in solving predictive analytics problems, such as spam detection and score prediction. It outlines the process of building a machine learning algorithm, including defining tasks, collecting datasets, extracting features, and selecting models. Additionally, it covers evaluation metrics and optimization methods for improving model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

REVISION -1

12 Oct 2023, Dinesh Babu J


SOLVING PREDICTIVE ANALYTICS PROBLEMS

USING MACHINE
What is Machine Learning?
LEARNING…
◻ “Systematic study of algorithms and systems that improve their
knowledge or performance with experience on certain tasks” [Prof.
Tom Mitchell, CMU]

◻ In a Machine Learning framework, Predictive Analytics problems


(Spam detection; ML Score prediction; News article grouping) become
tasks

◻ Experience is in the form of data, say past data

◻ Performance: how well does the algorithm predict..


HOW TO BUILD AN ML ALGORITHM
• The previous viewpoint was a requirement viewpoint
• Let us take the engineering viewpoint i.e. how to build an
Machine Learning System
• Machine learning formulation consists of “tasks, dataset,
features, and models”
• To start: Pose a suitable task, Collect a good dataset,
Extract relevant features
• To solve: Choose a model to implement, Learn a model
using the dataset (learning algorithm) , use the model to
predict (inference algorithm)

HOW DO BUILD A ML ALGORITHM
The previous viewpoint was a requirement viewpoint
• Let us take an engineering viewpoint of Machine
Learning
• Machine learning consists of “tasks, models, features,
and datasets”
• To start: Pose a suitable task, Collect a good dataset,
Extract relevant features
• To solve: Choose a model to implement, Learn a model
using the dataset (learning algorithm), use the
…Let’s get started
model to
INFERENCE USING ML
Output MALE/FEMALE

ML What is this ML algorithm?


Algo How do we build this algo?

Feature Hair length


Feature extraction

Input Image
INFERENCE USING ML
Output MALE/FEMALE

ML What is this ML algorithm


Algo How do we build this algo?

Feature Hair length Feature Pitch


Feature extraction Feature extraction

Input Image Input Voice


PREDICTIVE ANALYTICS PROBLEMS E.G.

Output
• Spam detection prediction
• Input: email ; Output = Spam or not

• ML
Score prediction (out of 100) in ML Course
Algo
• Input: 10,12 math marks; Output = Predicted Score

• News article group prediction


• Input: Set of news articles; Output = Cluster ID
Feature
Feature extraction

Input
TASK, DATASET, FEATURES

SPAM DETECTION PROBLEM


• Task – Classification {SPAM(+1), HAM(-1)}

• Dataset – {Emails, SPAM/HAM Label}


• Gmail: User flagging

• Features – {x1, x2…} e.g. frequency of occurrence of certain words (LOTTERY


– 10; VIAGRA – 8..)
Learning
Pa at

An
st ure
fe

sw
Algorithm
Em s

er
Model

s:
ai

S
l

PA
New Email Inference Question: SPAM/HAM

M
features

/H
Algorithm

AM
TASK, DATASET, FEATURES

ML SCORE PREDICTION PROBLEM


• Task – Regression [0, 100]

• Dataset – {10th 12th Math score, ML score}


• Questionnaire to past students

• Features – {x1, x2} e.g. {75, 80}


Past students Learning Answers: ML score
Math marks Algorithm
Model
New student’s Inference Question: ML score?
Math marks Algorithm
Credit:
Prof. Srihar
TASK, DATASET, FEATURES

NEWS ITEM GROUPING PROBLEM


• Predictive Clustering task:
• Assign one of {1, 2, ….K} to a news article

• Dataset – set of news articles


• Query Google with “news”

• Features – {x1, x2, x3, x4} e.g. {topic distributions}


Features from Learning Cluster IDs for
collection of Algorithm every news item
news items
Model
New article Inference Which Cluster
features Algorithm Is best suited?
TASKS

Output = Output =
continuous discrete Solutions
Hierarchical, Deep NN
Non-linear Model
Supervise Classificat SVR/SVM,
Regression Non-linear Model
d ion NN
Linear Model
Dimensiona
Unsupervi
lity Clustering
sed
reduction

HOW DO BUILD A ML ALGORITHM
The previous viewpoint was a requirement viewpoint
• Let us take an engineering viewpoint of Machine
Learning
• Machine learning consists of “tasks, models, features,
and datasets”
• To start: Pose a suitable task, Collect a good dataset,
Extract relevant features
• To solve: Choose a model to implement, Learn a model
using the dataset (learning algorithm), use the model to
…Let’s solve
MODEL, LEARNING, INFERENCE ALGORITH
SOLUTION 1: CLASSIFICATION
PROBLEM
• e.g. Spam detection problem
An
Learning
Pa
s sw
Algorithm er
t
Em
s:
Sp
ai
am
ls
• Models: straight line to divide (a linear model)
or
no
x2 x2 t

x1 x1
MODEL, LEARNING, INFERENCE ALGORITH
SOLUTION 2: REGRESSION
PROBLEM
• e.g ML1 Score prediction problem

• Models: straight line to fit the data

Learning
Pa t ks

An
st h M
10 ar

Algorithm

sw
m

st a
u d th

er
s:
en s

M
ts

y y

L1
sc
o
re
?
x1 x1
MODEL, LEARNING, INFERENCE ALGORITH
SOLUTION 3: CLUSTERING
PROBLEM
• e.g. News grouping problem

Learning
Co w

Cl er
Algorithm
ne
l le i t

us y
ev
ct em

te ne
s
io s

r w
n

ID s
of

s it e
fo m
r
• Model: distance based

x2 x2

x1 x1
EVALUATION METRIC: CLASSIFICATION
PROBLEM
• How do we know the solution on the right is better than the left?

• Total misclassifications (left) = 2

• Total misclassifications (right) = 0; so right is better

x2 x2

x1 x1
EVALUATION METRIC: REGRESSION PROBLEM

y p r e d i =m x i+ b

∑¿

y y
ypre
di
ypredi
yi yi
x x
xi xi
ML EXPERIMENTS
Training Learning Training
Features Algorithm Labels
Model
Test Inference
Features Predicted
Algorithm
Labels

Evaluation Predicted
e.g. Accuracy Labels
Algorithm Test
Labels
REVISIT: SOLVING THE PREDICTIVE ANALYTICS

PROBLEM
Machine learning consists of “tasks, models, features,
and datasets”

• To start: Pose a suitable task, Collect a good dataset,


Extract relevant features

• To solve: Choose a model to implement, Learn a model


using the dataset
• Search the space of model parameters and optimise the error
measure; e.g. find the best line that minimises the classification
MODELS AND MODEL PARAMETERS

Support Vector
Bayes Classifer Logistic regression
Machine
REVISIT: SOLVING THE PREDICTIVE ANALYTICS

PROBLEM
Machine learning consists of “tasks, models, features,
and datasets”

• To start: Pose a suitable task, Collect a good dataset,


Extract relevant features

• To solve: Choose a model to implement, Learn a model


using the dataset
• Search the space of model parameters and optimise the error
measure; e.g. find the best line that minimises the classification
OPTIMIZATION METHODS
Unconstrained
Minimisation
OPTIMIZATION METHODS
Unconstrained
Minimisation
OPTIMIZATION METHODS

Constrained
Minimisation
OPTIMIZATION METHODS
Global vs local
optimum
- Neural networks

Single optimum
- SVM
MODEL DATA
SPACE SPACE

y p r e d i =m x i+ b

L ( m , b )=∑ ¿
MODEL DATA
SPACE SPACE
FITTING A STRAIGHT LINE
Score
In ML D a t a : { x i , y i }i=1 : N
course
M o d e l : y p r e d=a+ b∗ x

L o s s : J ( a , b)=∑ ¿

10th
math
FITTING A STRAIGHT LINE – COST FUNCTION
CLOSED FORM – MINIMIZE SUM OF SQUARE ERROR
GRADIENT DESCENT – MINIMIZE SUM OF SQUARE
ERROR
FN MINIMIZATION
Exercise

CLOSED FORM
ITERATIVE METHOD



θ (n e w)=θ( o l d)−μ J (θ)

GRADIENT DESCENT
Cost function Gradient of the cost function

J ( θ )=1.2 ¿ J ❑
(θ)=2.4∗(θ−2)

at


θ (n e w)=θ( o l d)−μ J (θ) θ=θ (o l d ) Gradient
Descent
θ (n e w )= θ( o l d)− μ∗2.4 (θ−2) at θ=θ (o l d )

θ (o l d )=1

θ (n e w )=1− μ∗2.4 (1− 2)

C a s e 1 : μ=0.1
θ (n e w )=1−0.1∗2.4 (1− 2)=1.24

C a s e 2 : μ=0.5
θ (n e w )=1−0.5∗2.4 (1−2)=2.2
FITTING A GAUSSIAN
PROBABILISTIC CLASSIFIERS

• Probabilistic classifiers estimate

• Given x the features, we want to estimate the


probability that the class label C is say

C1(i.e. MALE) or C2(i.e. FEMALE)


DETERMINISTIC VS PROBABILISTIC
CLASSIFIERS
Output MALE/FEMALE, C Output P(MALE|x), P(FEMALE|x)

ML ML
Algo Algo

random
Input Hair length, x Input X variable
BAYES CLASSIFIER
• Probabilistic classifiers estimate

• Given x the features, we want to estimate the probability that


the class label C is say

C1(i.e. MALE) or C2(i.e. FEMALE)


x1 is continuous

• Use Bayes theorem, assuming 1-D feature vector


NORMALIZED HISTOGRAM

• To start with we can bin the


continuous values and
observe the histogram (after
normalizing i.e. the elements
sum up to 1)
FITTING A GAUSSIAN DISTRIBUTION
• Makes sense to keep
only two parameters
mu and sigma of the
Gaussian distribution
and throw away the
original data

• Question: How to
estimate mu and
sigma?
FITTING A GAUSSIAN
DENSITY ESTIMATION TASK:
WHICH GAUSSIAN IS THE BEST?

Data
{ x i }i=1 : N

Model
1 −¿ ¿¿
p(x i ∨μ , σ )= e
σ √2 π

Hair length
of women
MAXIMUM LIKELIHOOD FUNCTION
p( X∨θ)= p( x 1 x 2 .. x N ∨θ)

• Let us define a cost function in terms of the


parameters to be estimated
• We should find that value of parameter which
maximizes the probability of observing the given N
samples
MAXIMUM LIKELIHOOD
p( X∨θ)= p( x 1 x 2 .. x N ∨θ) Definition
N
p( X∨θ)= ∏ p(x i∨θ) IID assumption
i=1
l ( θ ) =l n ¿ Take log
̂
θ =a r g m a x θ ( l (θ )) Cost function: Maximise log-likelihood
• Assuming the samples are independently drawn

• Take logarithm (makes our life easier for further steps;


as it is a monotonic function, we are allowed to do so)
MAXIMUM LIKELIHOOD (CLOSED FORM)
p( X∨θ)= p( x 1 x 2 .. x N ∨θ)
N
p( X∨θ)= ∏ p(x i∨θ)
i=1
l ( θ ) =l n ¿
̂
θ =a r g m a x θ ( l (θ )) at maxima also
N
∇ θ l=Σ i=1 ∇θ l n( p ( x i∨θ))=0
ML – SINGLE DIMENSIONAL GAUSSIAN
ML – SINGLE DIMENSIONAL GAUSSIAN
Home work

ML – SINGLE DIMENSIONAL GAUSSIAN


INFERENCE ALGORITHM

0.25 25%
BAYES CLASSIFIER
NAÏVE BAYES CLASSIFIER
• Probabilistic classifiers estimate
P(C =C ∨x ) k

• Use Bayes theorem


p( x 1 x 2∨C=C k ) P(C=C k )
P(C =C k ∨x )=
p( x)

• Use Naïve assumption, given class label the features


are independent (class conditional independence)
p( x 1 ∨C=C k ) p( x 2∨C=C k ) P (C=C k )
P(C =C k ∨x )=
p ( x)
BAYES CLASSIFIER
• Probabilistic classifiers estimate
P(C =C ∨x ) k

• Given the 2D features, we want to estimate the probability


that the class label C is say

C1(i.e. MALE) or C2(i.e. FEMALE)


p( x 1 x 2∨C=C k ) P(C=C k )
P(C =C k ∨x )=
p( x)

• Use Bayes theorem, assuming 2D feature vector

You might also like