Data Analytics With MATLAB
Data Analytics With MATLAB
Adam Filion
Application Engineer
MathWorks
Requirements:
– Acquire and clean data from
multiple sources
– Accurate predictive model
– Easily deploy to production
environment
2
Challenges with Data Analytics
Cleaning data
Choosing a model
Moving to production
3
NYISO Energy Load Data
4
Techniques to Handle Missing Data
List-wise deletion
– Unbiased estimates
– Reduces sample size
Implementation options
– Built in to many
MATLAB functions
– Manual filtering
5
Techniques to Handle Missing Data
Substitution – replace
missing data points with a
reasonable approximation
Easy to model
6
Merge Different Sets of Data
Inner Join
Popular Joins:
– Inner
– Full Outer Full Outer Join
– Left Outer
– Right Outer
Left Outer Join
7
Full Outer Join
Key
A B
1 1.1
4 1.4 Key B Y Z
7 1.7 1 1.1 0.1 0.2
9 1.9
3 0.3 0.4
NaN
First Data Set
4 1.4 NaN NaN
X
Key Y Z 5 NaN 0.5 0.6
8
Learn More: Big Data with MATLAB
www.mathworks.com/discovery/big-data-matlab.html
www.mathworks.com/discovery/matlab-mapreduce-hadoop.html
9
Challenges with Data Analytics
Cleaning data
Choosing a model
Moving to production
10
Machine Learning
Characteristics and Examples
Characteristics
– Lots of variables
– System too complex to know
the governing equation
(e.g., black-box modeling)
Examples
– Pattern recognition (speech, images)
– Financial algorithms (credit scoring, algo trading)
– Energy forecasting (load, price)
– Biology (tumor detection, drug discovery)
11
Overview – Machine Learning
Type of Learning Categories of Algorithms
Classification
Supervised
Learning
Unsupervised
Clustering
Learning
12
Supervised Learning
Regression
Classification
13
Unsupervised Learning
k-Means,
Fuzzy C-Means
Hierarchical
Clustering Neural
Networks
Gaussian
Mixture
Hidden Markov
Model
14
Learn More: Machine Learning with MATLAB
mathworks.com/machine-learning
16
Challenges with Data Analytics
Cleaning data
Choosing a model
Moving to production
17
Deployment Highlights
Spreadsheets
Hadoop / Big Data
Client Applications
Application Servers
Royalty-free deployment
MATLAB
MATLAB MATLAB
Compiler Compiler SDK
MATLAB
Standalone Excel Production
Application Add-in Hadoop C/C++ Java .NET
Server
MATLAB
Toolboxes
Standalone Excel
Application Add-in Hadoop
3 MATLAB
Runtime
20
Integrating MATLAB-based Components
Application Author
MATLAB
Toolboxes
Application author and software
developer might be same person
1
Software Developer
21
MATLAB Production Server
22
Deployed Analytics
MATLAB Production Server
Request Broker
Web Server/
Webservice Predictive
Models
Weather
CTF
Data
Energy
Data
23
Learn More: Application Deployment with
MATLAB
www.mathworks.com/solutions/desktop-web-deployment/
24
Learn More: MATLAB Application Deployment
Also … www.mathworks.com/solutions/desktop-web-deployment/
25
Data Analytics Products
MATLAB
Parallel Computing Toolbox, MATLAB Distributed Computing Server MATLAB Production Server
Data Acquisition Toolbox Curve Fitting Toolbox Neural Network Toolbox MATLAB Compiler SDK
27
© 2015 The MathWorks, Inc.
28