Data Analytics with MATLAB
Adam Filion
Application Engineer
MathWorks
© 2015 The MathWorks, Inc.1
Case Study: Day-Ahead Load Forecasting
Goal:
– Implement a tool for easy and accurate computation of day-
ahead system load forecast
Requirements:
– Acquire and clean data from
multiple sources
– Accurate predictive model
– Easily deploy to production
environment
2
Challenges with Data Analytics
Aggregating data from multiple sources
Cleaning data
Choosing a model
Moving to production
3
NYISO Energy Load Data
4
Techniques to Handle Missing Data
List-wise deletion
– Unbiased estimates
– Reduces sample size
Implementation options
– Built in to many
MATLAB functions
– Manual filtering
5
Techniques to Handle Missing Data
Substitution – replace
missing data points with a
reasonable approximation
Easy to model
Too important to exclude
6
Merge Different Sets of Data
Join along a common axis
Inner Join
Popular Joins:
– Inner
– Full Outer Full Outer Join
– Left Outer
– Right Outer
Left Outer Join
7
Full Outer Join
Key
A B
1 1.1
4 1.4 Key B Y Z
7 1.7 1 1.1 0.1 0.2
9 1.9
3 0.3 0.4
NaN
First Data Set
4 1.4 NaN NaN
X
Key Y Z 5 NaN 0.5 0.6
1 0.1 0.2 7 1.7 0.7 0.8
3 0.3 0.4 9 1.9 NaN NaN
5 0.5 0.6 Joined Data Set
7 0.7 0.8
Second Data Set
8
Learn More: Big Data with MATLAB
www.mathworks.com/discovery/big-data-matlab.html
www.mathworks.com/discovery/matlab-mapreduce-hadoop.html
9
Challenges with Data Analytics
Aggregating data from multiple sources
Cleaning data
Choosing a model
Moving to production
10
Machine Learning
Characteristics and Examples
Characteristics
– Lots of variables
– System too complex to know
the governing equation
(e.g., black-box modeling)
Examples
– Pattern recognition (speech, images)
– Financial algorithms (credit scoring, algo trading)
– Energy forecasting (load, price)
– Biology (tumor detection, drug discovery)
11
Overview – Machine Learning
Type of Learning Categories of Algorithms
Classification
Supervised
Learning
Develop predictive Regression
Machine
model based on both
Learning input and output data
Unsupervised
Clustering
Learning
Group and interpret
data based only
on input data
12
Supervised Learning
Regression
Neural Ensemble Non-linear Reg. Linear
Decision Trees
Networks Methods (GLM, Logistic) Regression
Classification
Support Vector Discriminant Nearest
Naive Bayes
Machines Analysis Neighbor
13
Unsupervised Learning
k-Means,
Fuzzy C-Means
Hierarchical
Clustering Neural
Networks
Gaussian
Mixture
Hidden Markov
Model
14
Learn More: Machine Learning with MATLAB
mathworks.com/machine-learning
16
Challenges with Data Analytics
Aggregating data from multiple sources
Cleaning data
Choosing a model
Moving to production
17
Deployment Highlights
Database Servers Desktop Applications
Spreadsheets
Hadoop / Big Data
Client Applications
Application Servers
Web Applications Batch/Cron Jobs
Share with others who may not have MATLAB
Royalty-free deployment
Encryption to protect your intellectual property
18
Deploying Applications with MATLAB
MATLAB
MATLAB MATLAB
Compiler Compiler SDK
MATLAB
Standalone Excel Production
Application Add-in Hadoop C/C++ Java .NET
Server
MATLAB Compiler for sharing MATLAB programs without integration
programming
MATLAB Compiler SDK provides implementation and platform flexibility for
software developers
MATLAB Production Server provides the most efficient development path
for secure and scalable web and enterprise applications
19
Sharing Standalone Applications
Application Author
MATLAB
Toolboxes
MATLAB Compiler End User
2
Standalone Excel
Application Add-in Hadoop
3 MATLAB
Runtime
20
Integrating MATLAB-based Components
Application Author
MATLAB
Toolboxes
Application author and software
developer might be same person
1
Software Developer
MATLAB Compiler SDK
2
MATLAB
C/C++ Java .NET Production
Server 3 4 MATLAB
Runtime
21
MATLAB Production Server
Directly deploy MATLAB analytic programs into production
– Centrally manage multiple MATLAB programs & MCR versions
– Automatically deploy updates without server restarts
MATLAB Production Server(s)
Scalable & reliable
– Service large numbers of concurrent requests
– Add capacity or redundancy with additional servers
HTML
XML
Use with web, database & application servers Web
Java Script
– Lightweight client library isolates MATLAB processing Server(s)
– Access MATLAB programs using native data types
– Integrates with Java, .NET, C and Python
22
Deployed Analytics
MATLAB Production Server
Web MATLAB MATLAB
Application Production Desktop
Server Server
Train in
Apache Tomcat MATLAB MATLAB
Production Server
Request Broker
Web Server/
Webservice Predictive
Models
Weather
CTF
Data
Energy
Data
23
Learn More: Application Deployment with
MATLAB
www.mathworks.com/solutions/desktop-web-deployment/
24
Learn More: MATLAB Application Deployment
Also … www.mathworks.com/solutions/desktop-web-deployment/
25
Data Analytics Products
Access and Develop Integrate Analytics
Preprocess Data with Systems
Explore Data Predictive Models
MATLAB
Parallel Computing Toolbox, MATLAB Distributed Computing Server MATLAB Production Server
Database Toolbox Statistics and Machine Learning Toolbox MATLAB Compiler
Data Acquisition Toolbox Curve Fitting Toolbox Neural Network Toolbox MATLAB Compiler SDK
Mapping Toolbox Signal Processing Toolbox Computer Vision System Toolbox
Used in today’s demo
Image Acquisition Toolbox Image Processing Toolbox Econometrics Toolbox
Additional Data Analytics
OPC Toolbox products
26
Key Takeaways
Data preparation can be a big job;
leverage built-in MATLAB tools and
spend more time on the analysis
Rapidly iterate through different
predictive models, and find the one
that’s best for your application
Leverage parallel computing to scale-up your analysis to large
datasets
Eliminate the need to recode by deploying your MATLAB algorithms
into production
27
© 2015 The MathWorks, Inc.
28