0% found this document useful (0 votes)
236 views

Machine Learning With Matlab PDF

This document provides an overview of machine learning techniques available in MATLAB. It discusses characteristics of machine learning like using large datasets and modeling complex systems. Examples covered include pattern recognition, financial algorithms, energy forecasting, and biology. Challenges in machine learning like expertise required and lack of standardized solutions are also presented. The document outlines unsupervised and supervised machine learning categories and algorithms within each. It provides a workflow example of using supervised learning to predict customer behavior from bank marketing data.

Uploaded by

Søren Sagen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views

Machine Learning With Matlab PDF

This document provides an overview of machine learning techniques available in MATLAB. It discusses characteristics of machine learning like using large datasets and modeling complex systems. Examples covered include pattern recognition, financial algorithms, energy forecasting, and biology. Challenges in machine learning like expertise required and lack of standardized solutions are also presented. The document outlines unsupervised and supervised machine learning categories and algorithms within each. It provides a workflow example of using supervised learning to predict customer behavior from bank marketing data.

Uploaded by

Søren Sagen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Machine Learning with MATLAB

Abhishek Gupta
Sr. Application Engineer

© 2014 The MathWorks, Inc.1


Goals

 Overview of machine learning

 Machine learning models & techniques available in


MATLAB

 Streamlining the machine learning workflow with


MATLAB

2
Machine Learning
Characteristics and Examples

 Characteristics
– Lots of data (many variables)
– System too complex to know
the governing equation
(e.g., black-box modeling)

 Examples
– Pattern recognition (speech, images)
AAA 93.68% 5.55% 0.59% 0.18% 0.00% 0.00% 0.00% 0.00%

– Financial algorithms (credit scoring, algo trading) AA 2.44%

A 0.14%
92.60%

4.18%
4.03%

91.02%
0.73%

3.90%
0.15%

0.60%
0.00%

0.08%
0.00%

0.00%
0.06%

0.08%

– Energy forecasting (load, price) BBB 0.03%

BB 0.03%
0.23%

0.12%
7.49%

0.73%
87.86%

8.27%
3.78%

86.74%
0.39%

3.28%
0.06%

0.18%
0.16%

0.64%

B 0.00% 0.00% 0.11% 0.82% 9.64% 85.37% 2.41% 1.64%

– Biology (tumor detection, drug discovery) CCC 0.00%

D 0.00%
0.00%

0.00%
0.00%

0.00%
0.37%

0.00%
1.84%

0.00%
6.24%

0.00%
81.88%

0.00%
9.67%

100.00%
AAA AA A BBB BB B CCC D

4
Challenges – Machine Learning

 Significant technical expertise required

 No “one size fits all” solution

 Locked into Black Box solutions

 Time required to conduct the analysis

5
Overview – Machine Learning

Type of Learning Categories of Algorithms

Unsupervised
Clustering
Learning

Group and interpret


Machine data based only
Learning on input data

Classification

Supervised
Learning

Regression
Develop predictive
model based on both
input and output data

6
Unsupervised Learning

k-Means,
Fuzzy C-Means

Hierarchical

Clustering Neural
Networks

Gaussian
Mixture

Hidden Markov
Model

7
Supervised Learning

Regression

Neural Ensemble Non-linear Reg. Linear


Decision Trees
Networks Methods (GLM, Logistic) Regression

Classification

Support Vector Discriminant Nearest


Naive Bayes
Machines Analysis Neighbor

8
Supervised Learning - Workflow

Speed up Computations

Select Model

Data Train the Model Use for Prediction


Import Data
Known data Model
Model Predicted
Explore Data
Responses
Prepare Data Known responses New Data

Measure Accuracy
9
Example – Bank Marketing Campaign

 Goal:
– Predict if customer would subscribe to
bank term deposit based on different 100
Bank Marketing Campaign
Misclassification Rate

attributes
90

80

70

60

Percentage
No
Misclassified
50
Yes

 Approach: 40

30
Misclassified

– Train a classifier using different models


20

10


0

Measure accuracy and compare models

B
or s

s
et

s
VM
n

ge
is

r ee
aye

dT
s sio
lN

ly s

ag
hb

nT
o rt

ce
ura

B
na

eB
g re

ig

ive

pp

du
Ne
A

io
Ne

Tre
Re

cis
Su

Re
nt

Na
t
res
ina

De
tic
gis

cr im

ea
k -n

Lo

Dis
Reduce model complexity
– Use classifier for prediction

Data set downloaded from UCI Machine Learning repository


http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
11
Example – Bank Marketing Campaign

 Numerous predictive models with rich


documentation
Bank Marketing Campaign
Misclassification Rate
100

90

80

 Interactive visualizations and apps to 70

60

Percentage
No

aid discovery 50

40
Misclassified
Yes
Misclassified

30

20

10

Built-in parallel computing support


0

B
or s

s
et

s
VM
n

ge
is

r ee
aye

dT
s sio
lN

ly s

ag
hb

nT
o rt

ce
ura

B
na

eB
g re

ig

ive

pp

du
Ne
A

io
Ne

Tre
Re

cis
Su

Re
nt

Na
t
res
ina

De
tic
gis

cr im

ea
k -n
Lo

Dis
 Quick prototyping; Focus on
modeling not programming

12
Clustering
Overview
1

 What is clustering? 0.9

– Segment data into groups, 0.8

based on data similarity 0.7

0.6

0.5
 Why use clustering?
0.4
– Identify outliers
0.3
– Resulting groups may be 0.2
the matter of interest 0.1

0
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6

 How is clustering done?


– Can be achieved by various algorithms
– It is an iterative process (involving trial and error)

13
Example – Clustering Corporate Bonds
Hierarchical Clustering

500 1.6
 Goal: 1.4

Dist Metric:spearman
1000

– Cluster similar corporate bonds


1.2

Data Point #
1500
1
2000
together 2500
0.8
0.6
3000
0.4
3500
0.2
 Approach: 4000
1000 2000 3000 4000

– Cluster the bonds data using distance-


k-Means Clustering
based and probability-based 0.8

techniques 500
1000

Dist Metric:cosine
0.6
– Evaluate clusters for validity

Data Point #
1500
2000 0.4
2500
3000 0.2
3500
4000 0
1000 2000 3000 4000
Data Point #

14
Example – Clustering Corporate Bonds
Hierarchical Clustering

500 1.6
 Numerous clustering functions with 1.4

Dist Metric:spearman
1000

rich documentation 1.2

Data Point #
1500
1
2000
0.8
2500
0.6
3000
0.4
 Interactive visualizations to aid 3500
4000
0.2

discovery 1000 2000 3000 4000

k-Means Clustering
0.8

 Viewable source; not a black box 500


1000

Dist Metric:cosine
0.6

Data Point #
1500
2000 0.4

 Rapid exploration & development 2500


3000 0.2
3500
4000 0
1000 2000 3000 4000
Data Point #

15
Short-term Load Forecaster

 Goal:
– Develop a tool for Excel users to generate next day electricity
demand predictions

 Requirements:
– Easy to use interface
– Accurate predictive model

16
Deploying MATLAB Applications to Excel

3
Toolboxes

1 MATLAB End-User
Desktop Machine

MATLAB Compiler
2
MATLAB
Builder EX

.dll .bas

17
Deployment Highlights
Database Servers Desktop Applications
.exe

Excel
Spreadsheets
HADOOP

Client Front End


Application Servers .NET C
Applications

Web Applications Java Batch/Cron Jobs


CTF

 Royalty-free deployment

 Point-and-click workflow

 Unified process for desktop and server apps

18
MATLAB for Machine Learning

Challenges MATLAB Solution

Time (loss of productivity) Rapid analysis and application development


High productivity from data preparation, interactive
exploration, visualizations.

Extract value from data Machine learning, Video, Image, and Financial
Depth and breadth of algorithms in classification, clustering,
and regression
Computation speed Fast training and computation
Parallel computation, Optimized libraries

Time to deploy & integrate Ease of deployment and leveraging enterprise


Push-button deployment into production

Technology risk High-quality libraries and support


Industry-standard algorithms in use in production
Access to support, training and advisory services when
needed

19
Learn More: Machine Learning with
MATLAB
mathworks.com/machine-learning

20
Training Services
Exploit the full potential of MathWorks products

Flexible delivery options:


 Public training available worldwide
 Onsite training with standard or
customized courses
 Web-based training with live, interactive
instructor-led courses
 Self-paced interactive online training

More than 30 course offerings:


 Introductory and intermediate training on MATLAB, Simulink,
Stateflow, code generation, and Polyspace products
 Specialized courses in control design, signal processing, parallel computing,
code generation, communications, financial analysis,
and other areas

23
Consulting Services
Accelerating return on investment

A global team of experts supporting every stage of tool and process integration

Process and Technology

Continuous Improvement
Automation
Process and Technology
Standardization

Full Application
Deployment
Process Assessment
Component
Deployment
Advisory Services

Jumpstart
Migration Planning

Research Advanced Engineering Product Engineering Teams Supplier Involvement

24
Technical Support

Resources
 Over 100 support engineers
– All with MS degrees (EE, ME, CS)
– Local support in North America,
Europe, and Asia
 Comprehensive, product-specific Web
support resources

High customer satisfaction  


 95% of calls answered
within three minutes
 
 70% of issues resolved
within 24 hours 
 80% of customers surveyed
rate satisfaction at 80–100%

25
MATLAB Central
 Community for MATLAB and Simulink
users
 Over 1 million visits per month
 File Exchange
– Upload/download access to free files
including MATLAB code, Simulink models,
and documents
– Ability to rate files, comment, and ask questions
– More than 12,500 contributed files, 300
submissions per month, 50,000 downloads
per month
 Newsgroup
– Web forum for technical discussions about
MathWorks products
– More than 300 posts per day
 Blogs
– Commentary from engineers who design, build,
and support MathWorks products
– Open conversation at blogs.mathworks.com

Based on February 2011 data 26


Questions?

27

You might also like