0% found this document useful (0 votes)
19 views40 pages

Lecture 3 Machine Learning Techniques For Predictive Analytics

The document outlines various machine learning techniques for predictive analytics, focusing on artificial neural networks (ANN), support vector machines (SVM), k-nearest neighbor (kNN), Naïve Bayes, Bayesian networks, and ensemble modeling. It discusses the concepts, architectures, advantages, and disadvantages of these methods, as well as their applications in different fields. The lecture aims to provide a comprehensive understanding of these techniques and their roles in predictive analytics.

Uploaded by

afranealfred40
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views40 pages

Lecture 3 Machine Learning Techniques For Predictive Analytics

The document outlines various machine learning techniques for predictive analytics, focusing on artificial neural networks (ANN), support vector machines (SVM), k-nearest neighbor (kNN), Naïve Bayes, Bayesian networks, and ensemble modeling. It discusses the concepts, architectures, advantages, and disadvantages of these methods, as well as their applications in different fields. The lecture aims to provide a comprehensive understanding of these techniques and their roles in predictive analytics.

Uploaded by

afranealfred40
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

LECTURE 3

MACHINE-LEARNING TECHNIQUES FOR PREDICTIVE ANALYTICS


LEARNING OBJECTIVES

i. Understand the basic concepts and definitions of artificial neural networks (ANN)
ii. Learn the different types of ANN architectures
iii. Understand the concept and structure of support vector machines (SVM)
iv. Learn the advantages and disadvantages of SVM compared to ANN
v. Understand the concept and formulation of k-nearest neighbor (kNN) algorithm\
vi. Learn the advantages and disadvantages of kNN compared to ANN and SVM
vii. Understand the basic principles of Bayesian learning and Naïve Bayes algorithm
viii. Learn the basics of Bayesian Belief Networks and how they are used in predictive analytics
ix. Understand different types of ensemble models and their pros and cons in predictive analytics
NEURAL NETWORK CONCEPTS
• Neural networks (NN): a human brain metaphor for information processing
• Neural computing
• Artificial neural network (ANN)
• Many uses for ANN for
• pattern recognition, forecasting, prediction, and classification
• Many application areas
• finance, marketing, manufacturing, operations, information systems, and so on
BIOLOGICAL NEURAL NETWORKS

• Two interconnected brain cells (neurons)


PROCESSING INFORMATION IN A NN

• A single neuron (processing element – PE) with inputs and outputs


BIOLOGY ANALOGY
Biological Artificial
Soma Node
Dendrites Input
Axon Output
Synapse Weight
Slow Fast
Many neurons (109) Few neurons (a dozen to hundreds of thousands)
ELEMENTS OF ANN

• Processing element (PE)


• Network architecture
• Hidden layers
• Parallel processing
• Network information processing
• Inputs
• Outputs
• Connection weights
• Summation function
NEURAL NETWORK ARCHITECTURES

• Architecture of a neural network is driven by the task it is intended to address


• Classification, regression, clustering, general optimization, association
• Feedforward, multi-layered perceptron with backpropagation learning algorithm
• Most popular architecture:
• This ANN architecture will be covered in Lecture 6
• Other ANN Architectures – Recurrent, self-organizing feature maps, hopfield
networks, …
NEURAL NETWORK ARCHITECTURES
RECURRENT NEURAL NETWORKS
OTHER POPULAR ANN PARADIGMS
SELF ORGANIZING MAPS (SOM)

• First introduced by the Finnish


Professor Teuvo Kohonen
• Applies to clustering type
problems
OTHER POPULAR ANN PARADIGMS
HOPFIELD NETWORKS

• First introduced by John Hopfield


• Highly interconnected neurons
• Applies to solving complex
computational problems (e.g.,
optimization problems)
SUPPORT VECTOR MACHINES (SVM)

• SVM are among the most popular machine-learning techniques.


• SVM belong to the family of generalized linear models… (capable of
representing non-linear relationships in a linear fashion)
• SVM achieve a classification or regression decision based on the value
of the linear combination of input features.
• Because of their architectural similarities, SVM are also closely
associated with ANN.
SUPPORT VECTOR MACHINES (SVM)
• Goal of SVM: to generate mathematical functions that map input variables to
desired outputs for classification or regression type prediction problems.
• First, SVM uses nonlinear kernel functions to transform non-linear
relationships among the variables into linearly separable feature spaces.
• Then, the maximum-margin hyperplanes are constructed to optimally
separate different classes from each other based on the training dataset.
• SVM has solid mathematical foundation!
SUPPORT VECTOR MACHINES (SVM)

• A hyperplane is a geometric concept used to describe the separation surface


between different classes of things.
• In SVM, two parallel hyperplanes are constructed on each side of the
separation space with the aim of maximizing the distance between them.
• A kernel function in SVM uses the kernel trick (a method for using a linear
classifier algorithm to solve a nonlinear problem)
• The most commonly used kernel function is the radial basis function (RBF).
SUPPORT VECTOR MACHINES (SVM)

• Many linear classifiers (hyperplanes) may separate the data


HOW DOES A SVM WORKS?
• Following a machine-learning process, a SVM learns from the historic cases.
• The Process of Building SVM
1. Preprocess the data
• Scrub and transform the data.
2. Develop the model.
• Select the kernel type (RBF is often a natural choice).
• Determine the kernel parameters for the selected kernel type.
• If the results are satisfactory, finalize the model, otherwise change
the kernel type and/or kernel parameters to achieve the desired
accuracy level.
3. Extract and deploy the model.
THE PROCESS OF BUILDING A SVM
SVM APPLICATIONS
• SVM are the most widely used kernel-learning algorithms for wide range of
classification and regression problems
• SVM represent the state-of-the-art by virtue of their excellent generalization
performance, superior prediction power, ease of use, and rigorous theoretical
foundation
• Most comparative studies show its superiority in both regression and
classification type prediction problems.
• SVM versus ANN?
K-NEAREST NEIGHBOR METHOD (K-NN)
• ANNs and SVMs → time-demanding, computationally intensive iterative derivations
• k-NN a simplistic and logical prediction method, that produces very competitive results
• k-NN is a prediction method for classification as well as regression types (similar to ANN & SVM)
• k-NN is a type of instance-based learning (or lazy learning) – most of the work takes place at the
time of prediction (not at modeling)
• k : the number of neighbors used in the model
K-NEAREST NEIGHBOR METHOD (K-NN)

• The answer to “which class


a data point belongs to?”
depends on the value of k
THE PROCESS OF K-NN METHOD
K-NN MODEL PARAMETER
1. Similarity Measure: The Distance Metric
Minkowski distance
q q q
d (i, j ) = ( xi1 − x j1
q
+ xi 2 − x j 2 + ... + xip − x jp )
If q = 1, then d is called Manhatten distance

d (i, j ) = xi1 − x j1 + xi 2 − x j 2 + ... + xip − x jp


If q = 2, then d is called Euclidean distance
2 2 2
d (i, j ) = ( xi1 − x j1 + xi 2 − x j 2 + + xip − x jp )

• Numeric versus nominal values?


K-NN MODEL PARAMETER

2. Number of Neighbors (the value of k)


• The best value depends on the data
• Larger values reduces the effect of noise but also make
boundaries between classes less distinct
• An “optimal” value can be found heuristically
• Cross Validation is often used to determine the best value for k and
the distance measure
NAÏVE BAYES METHOD FOR CLASSIFICATION

• Naïve Bayes is a simple probability-based classification method


• Naïve - assumption of independence among the input variables
• Can use both numeric and nominal input variables
• Numeric variables need to be discretized
• Can be used for both regression and classification
• Naïve based models can be developed very efficiently and effectively
• Using maximum likelihood method
BAYES THEOREM
• Developed by Thomas Bayes (1701–1761)
• Determines the conditional probabilities
• Given that X and Y are two events:
P( X | Y ) P(Y ) Likelihood  Prior
P(Y | X ) = → Posterior =
P( X ) Evidence
P(Y | X ): Posterior probability of Y given X
P( X | Y ): Conditional probability of X given Y (likelihood )
P(Y ) : Prior probability of Y
P( X ) : Prior probability of X (evidence, or unconditional probability of X )

• Go trough the simple example in the book (p. 279)


NAÏVE BAYES METHOD FOR CLASSIFICATION
• Process of Developing a Naïve Bayes Classifier
• Training Phase
1. Obtain and pre-process the data
2. Discretize the numeric variables
3. Calculate the prior probabilities of all class labels
4. Calculate the likelihood for all predictor variables/values
• Testing Phase
• Using the outputs of Steps 3 and 4 above, classify the new samples
• See the numerical example in the book…
BAYESIAN NETWORKS
• A tool for representing dependency structure in a graphical, explicit, and
intuitive way
• A directed acyclic graph whose nodes correspond to the variables and
arcs that signify conditional dependencies between variables and their
possible values
• Direction of the arc matter
• A partial causality link in student retention
BAYESIAN NETWORKS
How can BN be constructed?
1. Manually
• By an engineer with the help of a domain expert
• Time demanding, expensive (for large networks)
• Experts may not even be available
2. Automatically
• Analytically …
• By learning/inducing the structure of the network from the historical data

• Availability high-quality historical data is imperative


BAYESIAN NETWORKS
How can BN be constructed?
• Analytically
BAYESIAN NETWORKS
How can BN be constructed?
Tree Augmented Naïve Bayes Network Structure

1. Compute information function


2. Build the undirected graph
3. Build a spanning tree
4. Convert the undirected
graph into a directed one
5. Construct a TAN model
Tree Augmented Naïve (TAN) Bayes
Network Structure
BAYESIAN NETWORKS

• EXAMPLE: Bayesian Belief Network for Predicting Freshmen Student Attrition


ENSEMBLE MODELING
• Ensemble – combination of models (or model outcomes) for better results
• Why do we need to use ensembles:
• Better accuracy
• More stable/robust/consistent/reliable outcomes
• Reality: ensembles wins competitions!
• Netflix $1M Prise completion
• Many recent competitions at Kaggle.com
• The Wisdom of Crowds
ENSEMBLE MODELING
Figure 3:Graphical Depiction of Model Ensembles for Prediction Modeling.
TYPES OF ENSEMBLE MODELING
Figure 3:Simple Taxonomy for Model Ensembles.
TYPES OF ENSEMBLE MODELING
Figure Bagging-Type Decision Tree Ensembles.
TYPES OF ENSEMBLE MODELING
Figure : Boosting-Type Decision Tree Ensembles.
ENSEMBLE MODELING
• Variants of Bagging & Boosting (Decision Trees)
• Decision Trees Ensembles  Homogeneous
• Random Forest 
 model types
• Stochastic Gradient Boosting  decision trees
 ( )
• Stacking
• Stack generation or super learners
 Homogeneous
• Information Fusion 
• Any number of any models
 model types
 decision trees
• Simple/weighted combining  ( )
TYPES OF ENSEMBLE MODELING
• STACKING • INFORMATION FUSION
ENSEMBLES – PROS AND CONS
Table : Brief List of Pros and Cons of Model Ensembles Compared to Individual Models.

PROS (Advantages) Description

• Accuracy Model ensembles usually result in more accurate models than individual models.

• Robustness Model ensembles tend to be more robust against outliers and noise in the data set than individual models.

• Reliability (stable) Because of the variance reduction, model ensembles tend to produce more stable, reliable, and believable results than
individual models.
• Coverage Model ensembles tend to have a better coverage of the hidden complex patterns in the data set than individual models.

CONS (Shortcomings) Description

• Complexity Model ensembles are much more complex than individual models.

• Computationally expensive Compared to individual models, ensembles require more time and computational power to build.

• Lack of transparency (explainability) Because of their complexity, it is more difficult to understand the inner structure of model ensembles (how they do
what they do) than individual models.
• Harder to deploy Model ensembles are much more difficult to deploy in an analytics-based Managerial decision-support system than single
models.
END OF LECTURES

• Questions / Comments

You might also like