0% found this document useful (0 votes)

322 views87 pages

ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated

This document provides an introduction to functional data analysis and high-dimensional data analytics. It discusses functional data, which are data that can be represented as functions, such as signals over time. It also covers topics like the "curse of dimensionality" when dealing with high-dimensional data, and how to perform dimension reduction and extract low-dimensional structures using methods like functional principal component analysis, splines, and regression. Splines are specifically piecewise polynomials that are fitted locally in intervals to provide flexibility while maintaining continuity across intervals.

Uploaded by

Vida Gholami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

322 views87 pages

ISYE 8803 - Kamran - M1 - Intro To HD and Functional Data - Updated

Uploaded by

Vida Gholami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 87

Topics on High-

Dimensional Data
Analytics
Functional Data Analysis

Kamran Paynabar, Ph.D.

Associate Professor
School of Industrial & Systems Engineering

Introduction to HD and
Functional Data
Learning Objectives
• To understand the definition of high-
dimensional data and Big Data.
• To explain the concepts of “curse of
dimensionality” and “low-dimensional
learning.”
• To define Functional Data.
Big Data
The initial definition revolves around the three Vs:
Volume, Velocity, and Variety

Volume: Large sample size, each sample could be high-dimensional. Use

MapReduce, Hadoop, etc. when data are too large to be stored in one machine.

Velocity: Data is generated and collected very quickly. Increase computational

efficiency.

Variety: The data types you mention all take different shapes. How to deal with
high-dimensional data, e.g., profiles, images, videos, etc.?
High-Dimensional Data
High-Dimensional data is defined as a data set with large number of attributes.
4000

3500

3000
Power (Watt)

2500

2000

1500

1000

500

0
0 0.05 0.1 0.15 0.2 0.25 0.3
Time (Sec)

Signals
➢ 100 KHz Images
➢ 1M Pixels Videos
➢ Images Sequence Surveys
➢ Movies Ratings
How to extract useful information from such massive datasets?
High-Dimensional Data vs. Big Data
Small n Large n
Small p Traditional Statistics with limited Classic large sample theory
samples Big Data Challenge
Large p HD Statistics and optimization Deep Learning and
High-Dimensional Data Challenge Deep Neural Networks

BD Analytics challenge: n is too large to be stored or processed on one machine.

– Solutions: big-data framework for data storage and computation (e.g.,
parallel computing, MapReduce, Hadoop, Spark, etc.)
High-Dimensional Data vs. Big Data
Small n Large n
Small p Traditional Statistics with limited Classic large sample theory
samples Big Data Challenge
Large p HD Statistics and optimization Deep Learning and
High-Dimensional Data Challenge Deep Neural Networks

HD Analytics challenge is mainly related to “curse of dimensionality”:

1 𝑝𝑝
Computational issue: In some algorithm optimizations, we need evaluations
𝜖𝜖
in order to obtain an solution within 𝜖𝜖 of the optimum.
Curse of Dimensionality
HD Analytics challenge is mainly related to
“curse of dimensionality”:

Model learning issue: As distance between

observations increases with the dimensions,
the sample size required for learning a model
drastically increases.
– Solutions: Feature extraction and
dimension reduction through low-
dimensional learning .
Low-Dimensional Learning from
High-Dimensional Data
• High-dimensional data usually have low dimensional structure

• Real data highly concentrated on low-dimensional, sparse, or degenerate

structure in high-dimensional space

How can the LD structure be learned and exploited from HD data?

LD Learning Methods
Functional Data Analysis
• Splines
• Smoothing Splines
• Kernels

Tensor Analysis
• Multilinear Algebra
• Low Rank Tensor Decomposition

Rank Deficient Methods

• (Functional) Principal Component Analysis (FPCA)
• Robust PCA (RPCA)
• Matrix Completion
Functional Data
Definition: A fluctuating quantity or impulse whose variations represent information
and is often represented as a function of time or space.
Single-channel signals Multi-channel signals Images Point Cloud
4000

3500

3000
Power (Watt)

2500

2000

1500

1000

500

0
0 0.05 0.1 0.15 0.2 0.25 0.3
Time (Sec)
Topics on High-
Dimensional Data
Analytics
Functional Data Analysis

Kamran Paynabar, Ph.D.

Associate Professor
School of Industrial & Systems Engineering

Review of Regression
Learning Objectives
• To review linear regression
• To understand geometric interpreation
of linear regerssion
• To explain feature extraction using
regression
Regression
• Observe a collection of i.i.d. training data

Where x’s are explanatory (independent) variables and y is the response

(dependent) variable

• We want to build a function f(x) to model the relationship between x’s and y

• An intuitive way of finding f(x) is by minimizing the following loss function

• We have to impose some constraints/structure on f(x), e.g.,

Regression – Least Square Estimates
𝐲𝐲 = 𝐗𝐗𝐗𝐗 + 𝛜𝛜

𝑦𝑦1 𝑥𝑥11 ⋯ 𝑥𝑥1𝑝𝑝 𝛽𝛽1 𝜖𝜖1

𝐲𝐲 = ⋮ 𝐗𝐗 = ⋮ ⋱ ⋮ 𝛃𝛃 = ⋮ 𝛜𝛜 = ⋮
𝑦𝑦𝑛𝑛 𝑥𝑥𝑛𝑛𝑛 ⋯ 𝑥𝑥𝑛𝑛𝑛𝑛 𝛽𝛽𝑝𝑝 𝜖𝜖𝑝𝑝

We wish to find the vector of least squares estimators that minimizes:

𝐿𝐿 = 𝛜𝛜𝑇𝑇 𝛜𝛜 = (𝐲𝐲 − 𝐗𝐗𝐗𝐗)𝑻𝑻 (𝐲𝐲 − 𝐗𝐗𝐗𝐗)

The resulting least squares estimate is

� = (𝐗𝐗 𝑻𝑻 𝐗𝐗)−1 𝐗𝐗 𝑻𝑻 𝐲𝐲
𝛃𝛃
Example
Pull strength for a wire bond against
wire length and die height. (Montgomery and Runger 2006)

Hastie. et al. 2009

Example cont.
(Montgomery and Runger 2006)

� = (𝐗𝐗 𝑻𝑻 𝐗𝐗)−1 𝐗𝐗 𝑻𝑻 𝐲𝐲
𝛃𝛃
Geometric interpretation
𝐲𝐲� = 𝐗𝐗(𝐗𝐗 𝑻𝑻 𝐗𝐗)−1 𝐗𝐗 𝑇𝑇 𝐲𝐲 = 𝐇𝐇𝐇𝐇
Projection Matrix (a.k.a. Hat matrix)

The outcome vector 𝑦𝑦 is orthogonally

projected onto the hyperplane spanned
by the input vectors 𝑥𝑥1 and 𝑥𝑥2 . The
Projection 𝑦𝑦� represents the vector of
predictions obtained by the least square
Hastie. et al. 2009
method
Properties of OLS Estimators
Unbiased estimators: � = 𝐸𝐸[(𝐗𝐗 𝑻𝑻 𝐗𝐗)−1 𝐗𝐗 𝑻𝑻 𝐲𝐲]
𝐸𝐸(𝛃𝛃)
= 𝐸𝐸[(𝐗𝐗 𝑻𝑻 𝐗𝐗)−1 𝐗𝐗 𝑻𝑻 (𝐗𝐗𝐗𝐗 + 𝛜𝛜)]
= 𝛃𝛃

Covariance Matrix: � = 𝜎𝜎 2 (𝐗𝐗 𝑻𝑻 𝐗𝐗)−1

cov(𝛃𝛃)

𝑆𝑆𝑆𝑆𝑆𝑆
𝜎𝜎� 2 =
𝑛𝑛 − 𝑝𝑝

According to the Gauss-Markov Theorem, among all unbiased linear estimates,

the least square estimate (LSE) has the minimum variance and it is unique.
Feature Extraction Using Regression

y = β 0 + β1t + β 2t 2 + β 3t 3
Polynomial regression
OLS Estimator

A signal (functional data/profile) sample

βˆ = [β 0 β1 β 2 β 3 ]T

Extracted features
Reference
• Montgomery, D. C., Runger, G., (2013), Applied Statistics and Probability for
Engineers. 6th Edition. Wiley, NY, USA.

• Hastie, T., Tibshirani, R., and Friedman, J., (2009) The Elements of Statistical
Learning. Springer Series in Statistics Springer New York Inc., New York, NY,
USA.
Topics on High-
Dimensional Data
Analytics
Functional Data Analysis

Kamran Paynabar, Ph.D.

Associate Professor
School of Industrial & Systems Engineering

Splines
Learning Objectives
• To discuss local vs. global polynomial
regression
• To explain splines and piecewise
polynomial regression
• To recognize splines basis and
truncated power basis
Polynomial Vs. Nonlinear Regression
mth-order polynomial regression
y = β 0 + β1 x + β 2 x 2 + β 3 x 3 +  + β m x m + ε

Nonlinear Regression:

Often requires domain knowledge

or first principles for finding the
underlying nonlinear function
a1 ( x − c) b1 + d + ε x > c
y=
a2 (− x + c) b2 + d + ε x ≤ c
Polynomial Regression
mth-order polynomial y = β 0 + β1 x + β 2 x 2 + β 3 x 3 +  + β m x m
Disadvantages of Polynomial Regression:
• Remote part of the function is very sensitive to outliers
• Less flexibility due to global functional structure
Polynomial Regression
Disadvantages of Polynomial Regression:
• Remote part of the function is very sensitive to outliers
• Less flexibility due to global functional structure

Example from Ji Zhou, 2011 Estimated using polynomials

Splines
• Linear combination of Piecewise polynomial functions under continuity assumption
• Partition the domain of x into continuous intervals and fit polynomials in each interval
separately
• Provides flexibility and local fitting

Suppose x ∈ [a, b]. Partition the x domain using the following points (a.k.a. knots).

Fit a polynomial in each interval under the continuity conditions and integrate them
by K
f ( X ) = ∑ β m hm ( X )
m =1
Splines – Simple Example

3 LSE
f (X ) = ∑β
m =1
m hm ( X ) β̂ m = Ym

6
f (X ) = ∑β
m =1
m hm ( X )

Image taken from: Hastie. et al. 2014

Splines – Simple Example
6
f (X ) = ∑β
m =1
m hm ( X )

Impose continuity constraint for each knot:

Total number of free parameters (degrees of freedom) is 6-2=4

Alternatively, one could incorporate the constraints into the

basis functions:

This basis is known as truncated power basis

Image taken from: Hastie. et al. 2014

Splines with Higher Order of Continuity
Cubic Polynomials Continuity constraints for smoothness:

splines df is calculated by
(# of regions)(# of parameters in each region) –
(# of knots)(# of constraints per knot)
Image taken from: Hastie. et al. 2014
Order-M Splines
Piecewise polynomials of order M-1, continuous derivatives up to order M-2
• M=1 piecewise-constant splines
• M=2 linear splines
• M=3 quadratic splines
• M=4 cubic splines

Truncated power basis functions:

• Total degrees of freedom is K+M

• Cubic spline is the lowest order spline for which the knot discontinuity is not visible to
human eyes
• Knots selection: a simple method is to use x quantiles. However, the choice of knots is a
variable/model selection problem.
Estimation 1

0.8

0.6

0.4

M+K
f (X ) = ∑β
m =1
m hm ( X )
0.2

0
0 20 40 60 80 100 120

Least square method can be used to estimate the coefficients

H = [h (x) h (x) h (x) h (x) h (x) h (x)] βˆ = (H T H ) H T y
−1
1 2 3 4 5 6

Linear Smoother: yˆ = Hβˆ = H H T H ( )

−1
H T y = Sy

Degrees of Freedom:
• Truncated power basis functions are simple and algebraically appealing.
• Not efficient for computation and ill-posed and numerically unstable. det (H T H ) = 1.3639e - 06
−1
%Data Generation

Example • X=[0:0.001:1];
• Y_true=sin(2*pi()*X.^3).^3;
1 • Y=sin(2*pi()*X.^3).^3+normrnd(0,0.1,1,length(X));
0.5 %Define knots and basis
0 • k = [1/7:1/7:6/7];
-0.5
Mean function • h1=ones(1,length(X));h2=X;h3=X.^2;h4=X.^3;
-1 • h5=(X-k(1)).^3;h5(h5<=0)=0;
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

• h6=(X-k(2)).^3;h6(h6<=0)=0;
• h7=(X-k(3)).^3;h7(h7<=0)=0;
• h8=(X-k(4)).^3;h8(h8<=0)=0;
• h9=(X-k(5)).^3;h9(h9<=0)=0;
• h10=(X-k(6)).^3;h10(h10<=0)=0;
• H=[h1' h2' h3' h4' h5' h6' h7' h8' h9' h10']
%Least square estimates
• B=(H'*H)\H'*Y'
• scatter(X,Y,'.'); hold on
• plot(X,H*B,'r')
• plot(X,Y_true,'k')
Topics on High-
Dimensional Data
Analytics
Functional Data Analysis

Kamran Paynabar, Ph.D.

Associate Professor
School of Industrial & Systems Engineering

Bsplines
Learning Objectives
• To discuss computational issue of
splines.
• To understand B-spline basis.
• To define the smoother matrix and
degrees of freedom.
Computational Issue of Splines
• Truncated power basis functions
are simple and algebraically
appealing.

• Not efficient for computation and ill-

posed and numerically unstable.

Cubic truncated power basis functions

(
det H T H )
−1
= 1.3639e - 06
Bsplines
Alternative basis vectors for piecewise
polynomials that are computationally
more efficient (deBoor 1978)
• Each basis function has a local support, i.e.,
it is nonzero over at most M (spline order)
consecutive intervals
• The basis matrix is banded
Bspline Basis
Let B j ,m ( x) be the jth B-spline basis function of order m (m ≤ M) for the knot sequence τ

Define the augmented knots sequence τ:

e.g. τ1 =  = τ M = ξ0

e.g. ξ K +1 = τ M + K +1 =  = τ 2 M + K

For j= 1,…, 2M+K-1,

For j= 1,…, 2M+K-m,

Bspline Basis in Matlab
n = 100
for sd = 1:4
subplot(4,1,sd)
knots = [ones(1,sd-1)...
linspace(1,n,10) n * ones(1,sd-1)];
nKnots = length(knots) - sd;
kspline = spmak(knots,eye(nKnots));
B=spval(kspline,1:n)';
plot(B)
end
Show the matrix B,
low bandwidth matrix: imagesc(B)

Generate bspline basis using R: bs(x, df, knots, intercept)

Example

Cubic truncated power basis functions Cubic Bspline basis functions

(
det H T H )
−1
= 1.3639e - 06 (
det BT B )
−1
= 1.4119e + 04
Smoother Matrix
Consider a regression Spline basis B

( )
ˆf = B BT B −1 BT y = Hy

• H is the smoother matrix (a.k.a. projection matrix)

• H is idempotent
• H is symmetric
• Degrees of freedom: trace (H)
Example - MATLAB
% Generate data:
n = 100; D = linspace(0,1,n); sigma = 0.3;
fun = @(x) 2.5 * x - sin(10 * x) - exp(-10 * x);
y = fun(D) + randn(1,n)*sigma; y = y';
% Generate B-spline basis:
sd=4;
knots = [ones(1,2) linspace(1,n,10) n * ones(1,2)];
nKnots = length(knots) - sd;
kspline = spmak(knots,eye(nKnots));
B=spval(kspline,1:n)';
% Least Square Estimation:
yhat = B/(B'*B)*B’*y;
K= trace(B/(B'*B)*B')
sigma2 = 1/(n-K)*(y-yhat)'*(y-yhat);
yn = yhat-3*sqrt(diag(sigma2*B/(B'*B)*B'));
yp = yhat+3*sqrt(diag(sigma2*B/(B'*B)*B'));
plot(D,y,'r.',D,yn,'b--',D,yp,'b--',D,yhat,'k-')
Example: Fat content prediction
• A beef distributor wants to know the fat
content of meat from spectrometric curves,
which correspond to the absorbance
measured at 100 wavelengths.
• She obtains the spectrometric curves for 215
pieces of finely chopped meat, (functional
predictors).
• Additionally, through a time consuming
chemical processing, she estimates the fat
content of each piece (response).
• She wants us to build a model to predict the
fat content of a new piece using the
spectrometric curve. The original dataset can be found at
http://lib.stat.cmu.edu/datasets/tecator.
Example: Fat content
Spectrometric Curves
• We split the dataset into train dataset, 195
curves, and test dataset, 20 curves.
• Regular approach: We builds a linear
regression using the 100 measurements
from the spectrometer as predictors.
• Functional approach: We use B-spline to B-Spline Coefficients
model each curve and extract features. 2.61 2.64 2.63 2.75 2.74
• The estimated B-spline coefficients are used 3.41 3.35 3.07 2.89 2.81
as predictive features that can be used in
building the fat regression model.
Fat Content
• The mean square errors of the predictions 12.5
for the test dataset are:
𝑟𝑟𝑟𝑟𝑟𝑟𝑒𝑒𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 27.02
𝑟𝑟𝑟𝑟𝑟𝑟𝑒𝑒𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 = 14.25
Reference
• Hastie, T., Tibshirani, R., and Friedman, J., (2009) The Elements of Statistical
Learning. Springer Series in Statistics Springer New York Inc., New York, NY,
USA.
Topics in High-
Dimensional
Analytics
Functional Data Analysis

Kamran Paynabar, Ph.D.

Associate Professor
School of Industrial & Systems Engineering

Smoothing splines
Learning Objectives
• Discuss B-spline basis boundary
issue
• Introduce natural spline basis
• Define smoothing splines
• Discuss cross-validation for tuning
penalty parameter.
Boundary Effects on Splines

Consider the following setting with the fixed

training data

• Behavior of splines tends to be sporadic near the

boundaries, and extrapolation can be problematic
Natural Cubic Splines
Natural Cubic Splines
• Additional constraints are added to make the function linear beyond the boundary knots
• Assuming the function is linear near the boundaries (where there is less information) is often
reasonable
• Cubic spline; linear on (−∞, ξ1 ] and [ξ K , ∞)
• Prediction variance decreases
• The price is the bias near the boundaries
• Degrees of freedom is K, the number of knots
• Each of these basis functions has zero second and third derivative in the linear region

B = ns(x, df, intercept)

Smoothing Splines

• First term measures the closeness of the model to the data (related to bias).
• Second term penalizes curvature of the function (related to variance).
• Avoid knot selection. Select as many knots as the observations number.

• λ is smoothing parameter controlling trade off

between bias and variance.
• λ = 0 interpolate the data (overfitting)
• λ = ∞ linear least-square regression
Example of Overfitting

True Estimated curve with a

large number of knots
Smoothing Splines
Penalized residual sum of squares

It can be shown that the minimizer is a natural cubic spline:

Where Nj’s are a set of natural cubic spline basis with knots at each of unique xi’s
Matrix form

Solution
Smoother Matrix
Smoothing spline estimator is a linear smoother

• Sλ is the smoother matrix

• Sλ is NOT idempotent
• Sλ is symmetric
• Sλ is positive definite

• Degrees of freedom: trace (Sλ)

Choice of Tuning Parameter
Collect 3 independent data sets for training, validation and test

Training Data Validation Data Test Data

Tuning
parameters Model Estimation Estimated Model Optimal Model

Intermediate Model Final Model

Assessment Assessment

Optimal tuning
parameters

• If an independent validation dataset is not affordable, the K-fold cross validation (CV)
or leave-one-out CV can be used.
K-fold Cross-Validation (CV)
• 5-fold cross-validation (blank: training; red: test)
Choice of Tuning Parameter
Model Selection Criteria
• Akaike information criterion (AIC) − 2 log( L) + 2k
where k is the # of estimated parameters and L is Likelihood
function

• Bayesian information criterion (BIC)

− 2 log( L) + k log(n) where n is the sample size

• Generalized Cross-Validation (GCV)

Example - Over-fitting
Generate 40 knots for fitting 100 data samples
• Generate data:
fun = @(x) 2.5 * x - sin(10 * x) - exp(-10 * x);
n = 100; D = linspace(0,1,n); k =40;
sigma = 0.3; y = fun(D) + randn(1,n)*sigma; y = y’;
• Generate B-spline basis:
knots = [ones(1,2) linspace(1,n,k) n * ones(1,2)];
nKnots = length(knots) - 3;
kspline = spmak(knots,eye(nKnots));
B=spval(kspline,1:n)';
• Least Square Estimation:
yhat = B/(B'*B)*B'*y;
sigma2 = 1/(n-k)*(y-yhat)'*(y-yhat);
yn = yhat-3*sqrt(diag(sigma2*B/(B'*B)*B'));
yp = yhat+3*sqrt(diag(sigma2*B/(B'*B)*B'));
plot(D,y,'r.',D,yn,'b--',D,yp,'b--',D,yhat,'k-')
Example - Avoid Over-fitting by
%Smoothing Penalty
B is defined in the previous slide
D1 = (B(2:n,:)-B(1:(n-1),:));
D2 = (D1(2:(n-1),:)-D1(1:(n-2),:));
%Different lambda selection
alllambda = 0.0001:0.5:100;
L =length(alllambda);
RSS = zeros(L,1);
df = zeros(L,1);
for i = 1:L
S = B/(B'*B+alllambda(i)*D2’*D2)*B';
yhat = S*y;
RSS(i) = sum((yhat-y).^2);
df(i) = trace(S);
end
%GCV criterion
GCV = (RSS/n)./(1-df/n).^2;
plot(alllambda,GCV)
Reference
• Hastie, T., Tibshirani, R., and Friedman, J., (2009) The Elements of Statistical
Learning. Springer Series in Statistics Springer New York Inc., New York, NY,
USA.
Topics on High-
Dimensional Data
Analytics
Functional Data Analysis

Kamran Paynabar, Ph.D.

Associate Professor
School of Industrial & Systems Engineering

Kernel Smoothers
Learning Objectives
• To define kernel functions
• To understand KNN regression,
weighted kernel regression, and linear
and polynomial kernel regression.
K-Nearest Neighbor (KNN)
n
KNN Average fˆ ( x0 ) = ∑ w( x , x ) y
i =1
0 i i

1
 if xi ∈ N k ( x0 )
where w( x0 , xi ) =  K
0

• Simple average of the k nearest

observations to x0 (local averaging)
• Equal weights are assigned to all neighbors
• The fitted function is in form of a step
function (non-smooth function)
From Hastie. et al. 2009
Kernel Function
Any non-negative real-valued integrable function that satisfies the following conditions:
∞

∫
1. K (u )du = 1
−∞
2. K is an even function; K (−u ) = K (u )
∞
3. It has finite second moment; ∫
−∞
u 2 K (u )du < ∞
Examples of Kernel functions
• Symmetric Beta family kernel
(1 − u 2 ) d
• Uniform kernel (d=0) K (u , d ) = 2 d +1 I ( u < 1)
2 B (d + 1, d + 1)
• Epanechnikov kernel (d=1)
• Bi/Tri-Weight (d=2,3)
3
• Tri-cube kernel K (u ) = (1 − u ) 3 I ( u < 1) From Hastie. et al. 2009
• Gaussian kernel K (u ) = 1 2π exp(−u 2 )
Kernel Smoother Regression
Kernel Regression
• Is weighted local averaging that fits a simple model separately at each query point x0

• More weights are assigned to closer observations.

• Localization is defined by the weighting function.

n
• For any point
∑ Kλ (x , x ) y
( )
0 i i
fˆ ( x0 ) = i =1
n where K λ ( x0 , xi ) = K x0 − xi λ
∑ Kλ (x , x )
i =1
0 i

• K is a kernel function.

• λ is so-called “bandwidth” or “window width” that defines the width of neighborhood.

• Kernel regression requires little training; all calculations get done at the evaluation time.
Example - Kernel Smoother
Regression
K λ ( x0 , xi ) = K ( x0 − xi λ )
3
K (u , d ) = (1 − u 2 ) I ( u < 1)
4
n

∑ Kλ (x , x ) y
i =1
0 i i
fˆ ( x0 ) = n

∑ Kλ (x , x )
i =1
0 i

From Hastie. et al. 2009

Choice of λ
• λ defines the width of neighborhood.
• Only points within [x0 - λ, x0 + λ] receive positive weights in kernels
with the support of [-1,1]
• Larger λ: smoother estimate, larger bias, smaller variance
• Smaller λ: rougher estimate, smaller bias, larger variance

The following criteria can be used for determining of λ:

– Leave-one-out cross validation

– K-fold cross validation

– Generalized cross validation

Example – RBF Kernel
% Data Genereation
x=[0:100];
y=[sin(x/10)+(x/50).^2+0.1*normrnd(0,1,1,101)]';
kerf=@(z)exp(-z.*z/2)/sqrt(2*pi);
% leave-one-out CV
h1=[1:0.1:4];

MSE
for j=1:length(h1); h=h1(j) ;
for i=1:length(y)
X1=x;Y1=y;X1(i)=[];Y1(i)=[];
z=kerf((x(i)-X1)/h); yke=sum(z.*Y1')/sum(z);
er(i)=y(i)-yke;
end
mse(j)=sum(er.^2);
end
lambda
plot(h1,mse); h=h1(find(mse==min(mse)));
Example – RBF Kernel
% Interpolation for N values
N=1000;
xall = linspace(min(x),max(x),N);

f = zeros(1,N);
for k=1:N
z=kerf((xall(k)-x)/h);
f(k)=sum(z.*y')/sum(z);
end
Drawbacks of Local Averaging
The local averaging can be biased on
the boundaries of the domain due to the
asymmetry of the kernel in that region.

From Hastie. et al. 2009

Local Linear Regression
Locally weighted linear regression model is estimated by
n

∑ K λ ( x0 , xi )[ yi − β 0 ( x0 ) − β1 ( x0 ) xi ]
2
arg min
β 0 ( x0 ), β1 ( x0 ) i =1

The estimate of the function at x0 is then

fˆ ( x0 ) = βˆ0 ( x0 ) + βˆ1 ( x0 ) x0

Local linear regression corrects the bias on

the boundaries

From Hastie. et al. 2009

Local Polynomial Regression
Locally weighted polynomial regression
model is estimated by
2
n  p 
arg min ∑ K λ ( x0 , xi )  yi − β 0 ( x0 ) − ∑ β j ( x0 ) xij 
β 0 ( x0 ), β1 ( x0 ) i =1
 j =1 

The estimate of the function at x0 is then

p
fˆ ( x0 ) = βˆ0 ( x0 ) + ∑ β^ j ( x0 ) x0j
j =1

Local polynomial regression corrects the

bias in curvature regions

From Hastie. et al. 2009

Local Polynomial Regression
• Higher-order polynomials result in
lower of the bias, higher variance.
• Local linear fits can help reduce
linear bias on the boundaries.
• Local quadratic fits are effective for
reducing bias due to curvature in
interior region, but not in boundary
regions (increase the variance)

From Hastie. et al. 2009

Reference
• Hastie, T., Tibshirani, R., and Friedman, J., (2009) The Elements of Statistical
Learning. Springer Series in Statistics Springer New York Inc., New York, NY,
USA.
Topics on High-
Dimensional Data
Analytics
Functional Data Analysis

Kamran Paynabar, Ph.D.

Associate Professor
School of Industrial & Systems Engineering

Functional Principal Component

Learning Objectives
• How to perform PCA on functional data?
• To understand KL theorem and identify
eigen-function and eigen-values.
• To demonstrate feature extraction using
FPCA.
Signal Functional Form
𝑠𝑠𝑖𝑖 𝑡𝑡 = 𝜇𝜇 𝑡𝑡 + 𝜖𝜖𝑖𝑖 (𝑡𝑡)

• 𝑠𝑠𝑖𝑖 (𝑡𝑡): observed signals, 𝑖𝑖 = 1, … , 𝑁𝑁

• 𝜇𝜇 𝑡𝑡 : continuous functional mean

• 𝜖𝜖𝑖𝑖 𝑡𝑡 : realizations from a stochastic process with

mean function 0 and covariance function 𝐶𝐶 𝑡𝑡, 𝑡𝑡 ′
It includes both random noise and signal-to-
signal variations
Karhunen–Loeve Theorem
Using Karhunen–Loeve Theorem 𝜖𝜖(𝑡𝑡) can be written as
∞

𝜖𝜖𝑖𝑖 𝑡𝑡 = � ξ𝑖𝑖𝑖𝑖 𝜙𝜙𝑘𝑘 (𝑡𝑡)

𝑘𝑘=1
Where ξ𝑖𝑖𝑖𝑖 are zero-mean and uncorrelated coefficients, i.e., 𝐸𝐸 ξ𝑖𝑖𝑖𝑖 = 0 & 𝐸𝐸[(ξ𝑖𝑖𝑖𝑖 )2 ] = λ𝑘𝑘 and
𝜙𝜙𝑘𝑘 (𝑡𝑡) are eigen-functions of the covariance function 𝐶𝐶 𝑡𝑡, 𝑡𝑡 ′ = cov(𝜖𝜖 𝑡𝑡 , 𝜖𝜖 𝑡𝑡 ′ ) i.e.,
∞

𝐶𝐶 𝑡𝑡, 𝑡𝑡 ′ = � 𝜆𝜆𝑘𝑘 𝜙𝜙𝑘𝑘 (𝑡𝑡)𝜙𝜙𝑘𝑘 (𝑡𝑡 ′ )

𝑘𝑘=1
λ𝟏𝟏 ≥ λ𝟐𝟐 ≥ ⋯ are ordered eigen-values. The eigen-functions can be obtained by solving:
𝑀𝑀
� 𝐶𝐶 𝑡𝑡, 𝑡𝑡 ′ 𝜙𝜙𝑘𝑘 𝑡𝑡 𝑑𝑑𝑑𝑑 = 𝜆𝜆𝑘𝑘 𝜙𝜙𝑘𝑘 (𝑡𝑡𝑡)
0
Functional PCA
The variance of ξ𝑖𝑖𝑖𝑖 quickly decays with 𝑘𝑘. Therefore, only a few ξ𝑖𝑖𝑖𝑖 , also known as FPC-
scores, would be enough to accurately approximate the noise function. That is,
𝐾𝐾

𝜖𝜖𝑖𝑖 𝑡𝑡 ≅ � 𝜉𝜉𝑖𝑖𝑖𝑖 𝜙𝜙𝑘𝑘 (𝑡𝑡)

𝑘𝑘=1

Signals decomposition is given by

𝑠𝑠𝑖𝑖 𝑡𝑡 = 𝜇𝜇 𝑡𝑡 + 𝜖𝜖𝑖𝑖 𝑡𝑡

≅ 𝜇𝜇 𝑡𝑡 + ∑𝐾𝐾
𝑘𝑘=1 𝜉𝜉𝑖𝑖𝑖𝑖 𝜙𝜙𝑘𝑘 (𝑡𝑡)
Model Estimation
– Complete signals: sampled regularly

– Incomplete signals: sampled irregularly, sparse, fragmented

Estimation of Mean Function
Historical signals 𝑠𝑠𝑖𝑖 𝑡𝑡𝑖𝑖𝑗𝑗
– 𝑖𝑖 = 1, … , 𝑁𝑁, is the signal index
– 𝑗𝑗 = 1, … , 𝑚𝑚𝑖𝑖 , is the observation index in
each signal
– 𝑠𝑠𝑖𝑖 𝑡𝑡𝑖𝑖𝑖𝑖 ≅ 𝜇𝜇 𝑡𝑡𝑖𝑖𝑖𝑖 + ∑𝐾𝐾
𝑘𝑘=1 𝜉𝜉𝑖𝑖𝑖𝑖 𝜙𝜙𝑘𝑘 (𝑡𝑡𝑖𝑖𝑖𝑖 )

We can estimate mean function 𝜇𝜇̂ 𝑡𝑡 using local

linear regression by minimizing
𝑛𝑛 𝑚𝑚𝑖𝑖
𝑡𝑡𝑖𝑖𝑖𝑖 − 𝑡𝑡 2
min � � 𝑊𝑊 𝑠𝑠𝑖𝑖 𝑡𝑡𝑖𝑖𝑖𝑖 − 𝑐𝑐0 − 𝑡𝑡 − 𝑡𝑡𝑖𝑖𝑖𝑖 𝑐𝑐1
𝑐𝑐0 ,𝑐𝑐1 ℎ
𝑖𝑖=1 𝑗𝑗=1

– Solution: 𝜇𝜇̂ 𝑡𝑡 = 𝑐𝑐0,𝑡𝑡

̂
Estimation of Covariance Function
First, we use estimated mean functions to estimate the
raw covariance function 𝐶𝐶̂ 𝑡𝑡, 𝑡𝑡 ′ :

𝐶𝐶�𝑖𝑖 𝑡𝑡𝑖𝑖𝑗𝑗 , 𝑡𝑡𝑖𝑖𝑘𝑘 = 𝑠𝑠𝑖𝑖 (𝑡𝑡𝑖𝑖𝑖𝑖 ) − 𝜇𝜇̂ 𝑡𝑡𝑖𝑖𝑗𝑗 𝑠𝑠𝑖𝑖 (𝑡𝑡𝑖𝑖𝑘𝑘 ) − 𝜇𝜇̂ 𝑡𝑡𝑖𝑖𝑖𝑖
To estimate the covariance surface �𝐶𝐶 𝑡𝑡, 𝑡𝑡 ′ , we use
local quadratic regression
𝑛𝑛
𝑡𝑡𝑖𝑖𝑖𝑖 − 𝑡𝑡 𝑡𝑡𝑖𝑖𝑘𝑘 − 𝑡𝑡𝑡 2
min � � 𝑊𝑊 , 𝐶𝐶�𝑖𝑖 𝑡𝑡𝑖𝑖𝑗𝑗 , 𝑡𝑡𝑖𝑖𝑘𝑘 − 𝑐𝑐0 − 𝑐𝑐1 𝑡𝑡 − 𝑡𝑡𝑖𝑖𝑖𝑖 − 𝑐𝑐2 (𝑡𝑡 ′ − 𝑡𝑡𝑖𝑖𝑖𝑖 )
𝑐𝑐0 ,𝑐𝑐1 ,𝑐𝑐2 ℎ ℎ
𝑖𝑖=1 1≤𝑗𝑗≠𝑘𝑘≤𝑚𝑚𝑖𝑖

Solution: �𝐶𝐶 𝑡𝑡, 𝑡𝑡 ′ = 𝑐𝑐0̂ (𝑡𝑡, 𝑡𝑡𝑡)

Solve the estimated covariance function
𝜙𝜙� 𝑘𝑘 (𝑡𝑡) is estimated by discretizing the estimated
covariance function 𝐶𝐶̂ 𝑡𝑡, 𝑡𝑡 ′
Computing FPC-Scores
𝑀𝑀
Computing eigen-function 𝜙𝜙� 𝑘𝑘 𝑡𝑡𝑗𝑗 by solving � 𝐶𝐶̂ 𝑡𝑡, 𝑡𝑡 ′ 𝜙𝜙� 𝑘𝑘 𝑡𝑡 𝑑𝑑𝑑𝑑 = 𝜆𝜆̂ 𝑘𝑘 𝜙𝜙� 𝑘𝑘 (𝑡𝑡𝑡)
0

𝑀𝑀 1, 𝑚𝑚 = 𝑘𝑘
– ∫0 𝜙𝜙� 𝑘𝑘 𝑡𝑡 × 𝜙𝜙�𝑚𝑚 𝑡𝑡 𝑑𝑑𝑑𝑑 = �
0, 𝑚𝑚 ≠ 𝑘𝑘
– solved by discretizing the estimated covariance function 𝐶𝐶̂ 𝑡𝑡𝑗𝑗 , 𝑡𝑡𝑗𝑗′

𝑀𝑀
Computing FPC-scores 𝜉𝜉̂𝑖𝑖𝑖𝑖 𝜉𝜉𝑖𝑖𝑖𝑖 = � 𝑠𝑠𝑖𝑖 𝑡𝑡 − 𝜇𝜇̂ 𝑡𝑡 𝜙𝜙𝑘𝑘 𝑡𝑡 𝑑𝑑𝑑𝑑
0

𝐽𝐽
– Numerical integration
𝜉𝜉̂𝑖𝑖𝑖𝑖 = � 𝑠𝑠𝑖𝑖 𝑡𝑡𝑗𝑗 − 𝜇𝜇̂ 𝑡𝑡𝑗𝑗 𝜙𝜙�𝑘𝑘 𝑡𝑡𝑗𝑗 𝑡𝑡𝑗𝑗 − 𝑡𝑡𝑗𝑗−1
where 𝑡𝑡0 = 0 𝑗𝑗=1
FPCA Example

Original signals Signals with missing data (6 observations for each signal)
FPCA Example

Mean function Smoothed covariance function

FPCA Example

Fraction of variance explained 1st eigen function

st
(1 eigen value explained more than
98% of the total variation)
Example: Functional Data
• In a press machine the load profiles are measured during the forging process.
The goal is to predict the quality of produced product based on the load profiles.
• There are 200 profiles along with their quality labels. 100 non-defective and 100
defective parts.

• For a new curve, we want to decide if it belongs to class 1 or to class 2.

• Option 1: B-spline coefficients
• Option 2: Functional principal components
Example: Functional Data
Classification

Step 1: Extract features from the functional data

• B-spline coefficients
• Functional principal components
Note: Each curve has 50 time observations, using B-splines we reduced the curve
dimension from 50 to 10, and from 50 to 2 using FPCA scores.

Step 2: Train a classifier (e.g., Random Forest, SVM, etc.) using the extracted
features.
Example: Functional Data
Classification

Step 3. Predict the class for 40 new observations.

• Using 10 B-splines, all the curves were correctly classified.
• Using the scores of the first two principal components, 2 curves of class 1 were
classified in class 2.

Week 8 Homework - Summer 2020: Attempt History
No ratings yet
Week 8 Homework - Summer 2020: Attempt History
11 pages
NB 9
No ratings yet
NB 9
29 pages
Rewording The Brain How Cryptic Crosswords Can Improve Your Memory and Boost The Power and Agility of Your Brain Research PDF Download
100% (14)
Rewording The Brain How Cryptic Crosswords Can Improve Your Memory and Boost The Power and Agility of Your Brain Research PDF Download
15 pages
Week 2 Homework - Summer 2020: Attempt History
No ratings yet
Week 2 Homework - Summer 2020: Attempt History
27 pages
HW5 Solutions Autotag
No ratings yet
HW5 Solutions Autotag
18 pages
Gfmam The Maintenance Framework First Edition English Version
100% (1)
Gfmam The Maintenance Framework First Edition English Version
24 pages
Week 3 Homework - Summer 2020: Attempt History
No ratings yet
Week 3 Homework - Summer 2020: Attempt History
26 pages
f19 HW03 Module02b Solns PDF
No ratings yet
f19 HW03 Module02b Solns PDF
8 pages
2104 RZIM Academy Notes 5.1
No ratings yet
2104 RZIM Academy Notes 5.1
5 pages
HW2 Sol
No ratings yet
HW2 Sol
5 pages
Probability With Applications in Engineering, Science, and Technology, 2nd (Instructor's Solution Manual) - Matthew A. Carlton
100% (1)
Probability With Applications in Engineering, Science, and Technology, 2nd (Instructor's Solution Manual) - Matthew A. Carlton
400 pages
Week 4 Homework: This Is A Preview of The Published Version of The Quiz
No ratings yet
Week 4 Homework: This Is A Preview of The Published Version of The Quiz
7 pages
Bab 9 Akm
No ratings yet
Bab 9 Akm
44 pages
Probability and Stochastic Processes 3rd Edition Roy D. Yates Chapter 1 Solutions
33% (3)
Probability and Stochastic Processes 3rd Edition Roy D. Yates Chapter 1 Solutions
13 pages
CS 3600 Project 4b Analysis
No ratings yet
CS 3600 Project 4b Analysis
3 pages
Cor Jesu College, Inc. College of Health Sciences: Infographic Competition
No ratings yet
Cor Jesu College, Inc. College of Health Sciences: Infographic Competition
3 pages
Mini Research On Homeless
No ratings yet
Mini Research On Homeless
6 pages
KALAnnualReport2016 17
No ratings yet
KALAnnualReport2016 17
92 pages
Lesson 04 - Physical Science
No ratings yet
Lesson 04 - Physical Science
24 pages
Factors Affecting The Extent of Compliance of Adolescent Pregnant Mothers On Prenatal Care Services
100% (1)
Factors Affecting The Extent of Compliance of Adolescent Pregnant Mothers On Prenatal Care Services
29 pages
NB 13
No ratings yet
NB 13
27 pages
For Ex Project
No ratings yet
For Ex Project
64 pages
ACIIA July Newsletter
No ratings yet
ACIIA July Newsletter
14 pages
Bodybuilding, Drugs and Risk
No ratings yet
Bodybuilding, Drugs and Risk
230 pages
ANTENATAL ASSESSMENT Form 10
No ratings yet
ANTENATAL ASSESSMENT Form 10
4 pages
T150mm - Beam and Blocks PDF
No ratings yet
T150mm - Beam and Blocks PDF
2 pages
Dhupguri Report
No ratings yet
Dhupguri Report
11 pages
Neofiti 1 - Deuteronomio - Translation-English
No ratings yet
Neofiti 1 - Deuteronomio - Translation-English
68 pages
NB 14
No ratings yet
NB 14
15 pages
In Vivo and in Vitro Evaluation of Four Different Aqueous Polymeric Dispersions For Producing An Enteric Coated Tablet
No ratings yet
In Vivo and in Vitro Evaluation of Four Different Aqueous Polymeric Dispersions For Producing An Enteric Coated Tablet
6 pages
Chapter 5. Probability and Random Process - Updated
No ratings yet
Chapter 5. Probability and Random Process - Updated
151 pages
An Application of Ultrasound Technology in Condition Monitoring-Rev.1-Web
No ratings yet
An Application of Ultrasound Technology in Condition Monitoring-Rev.1-Web
16 pages
Cue Words Relaxation
No ratings yet
Cue Words Relaxation
4 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Asset-V1 MITx+18.6501x+3T2019+type@asset+block@resources Syllabus Schedule 3T2019
50% (2)
Asset-V1 MITx+18.6501x+3T2019+type@asset+block@resources Syllabus Schedule 3T2019
3 pages
Solutions To Selected Problems in Numerical Optimization 2nbsped - Compress
No ratings yet
Solutions To Selected Problems in Numerical Optimization 2nbsped - Compress
75 pages
Ode
No ratings yet
Ode
387 pages
Exercises: Applied Bayesian Analysis and Numerical Methods (STK4021)
No ratings yet
Exercises: Applied Bayesian Analysis and Numerical Methods (STK4021)
30 pages
Operations Research PDF
33% (3)
Operations Research PDF
2 pages
(MADHU MANGAL PAUL) Numerical Analysis For Scienti
100% (1)
(MADHU MANGAL PAUL) Numerical Analysis For Scienti
666 pages
DIY Simple Machine Model Rubric
No ratings yet
DIY Simple Machine Model Rubric
1 page
FINAL MODEL PAPER 2023-24 Class 7
No ratings yet
FINAL MODEL PAPER 2023-24 Class 7
11 pages
Logistic Regression in R
No ratings yet
Logistic Regression in R
19 pages
BT Mid 1ans
No ratings yet
BT Mid 1ans
9 pages
Manisharesume 2020
No ratings yet
Manisharesume 2020
5 pages
Levenberg Examples
100% (1)
Levenberg Examples
2 pages
MATLAB Project Subject: Linear Algebra: 3. Solutions
No ratings yet
MATLAB Project Subject: Linear Algebra: 3. Solutions
20 pages
HW3 Solutions Autotag
No ratings yet
HW3 Solutions Autotag
6 pages
SCSA3016 Data Science L T P Credits Total Marks 3 0 0 3 100
No ratings yet
SCSA3016 Data Science L T P Credits Total Marks 3 0 0 3 100
1 page
Least Square Vs Gradient Descent
100% (1)
Least Square Vs Gradient Descent
52 pages
Neural Networks Study Notes
100% (2)
Neural Networks Study Notes
11 pages
Probability and Statistics With Reliability, Queuing and Computer Science Applications: Chapter 7 On
No ratings yet
Probability and Statistics With Reliability, Queuing and Computer Science Applications: Chapter 7 On
41 pages
NMC Handout - 3
No ratings yet
NMC Handout - 3
17 pages
23PGHR023 Final Review Ather
No ratings yet
23PGHR023 Final Review Ather
13 pages
Assignment 1 Answers
No ratings yet
Assignment 1 Answers
7 pages
Assignment 5: Unit 7 - Week 5
No ratings yet
Assignment 5: Unit 7 - Week 5
6 pages
CS2 B Chapter 2 - Markov Chains - Solutions
No ratings yet
CS2 B Chapter 2 - Markov Chains - Solutions
15 pages
(Solutions Manual) Probability and Statistics For Engineers and Scientists Manual Hayler
100% (1)
(Solutions Manual) Probability and Statistics For Engineers and Scientists Manual Hayler
51 pages
Gamma Extended Frechet Distribution
No ratings yet
Gamma Extended Frechet Distribution
23 pages
PPT9-Renewal Process
No ratings yet
PPT9-Renewal Process
29 pages
Binomial Distribution
No ratings yet
Binomial Distribution
16 pages
Remediation and Enrichment Plan
No ratings yet
Remediation and Enrichment Plan
11 pages
Exponential Distribution
No ratings yet
Exponential Distribution
19 pages
CpyProbStatSection PDF
No ratings yet
CpyProbStatSection PDF
240 pages
TDS Rheomix 141
No ratings yet
TDS Rheomix 141
2 pages
Week 4 Homework
No ratings yet
Week 4 Homework
7 pages
Geometric Distribution Report
No ratings yet
Geometric Distribution Report
5 pages
Pstat 160 A Homework 6
No ratings yet
Pstat 160 A Homework 6
2 pages
ISyE 6669 Homework 15 PDF
No ratings yet
ISyE 6669 Homework 15 PDF
3 pages
The Problem of Overfitting - Coursera
No ratings yet
The Problem of Overfitting - Coursera
1 page
Uncertainty Based Information
No ratings yet
Uncertainty Based Information
10 pages
Operational Research
No ratings yet
Operational Research
13 pages
Week 1 Homework - Summer 2020: Attempt History
No ratings yet
Week 1 Homework - Summer 2020: Attempt History
13 pages
STAT 650 - Foundations of Data Science Syllabus
No ratings yet
STAT 650 - Foundations of Data Science Syllabus
13 pages
Data Mining1
No ratings yet
Data Mining1
3 pages
Chapter 1 Mathematics of Survival Analysis
No ratings yet
Chapter 1 Mathematics of Survival Analysis
13 pages
Midterm Examination Engineering Probability Semester 1 2017 2018 Index 1
No ratings yet
Midterm Examination Engineering Probability Semester 1 2017 2018 Index 1
3 pages
HW 06 Markov Chains Solutions
No ratings yet
HW 06 Markov Chains Solutions
4 pages
Ma2262 Probability and Queuing Theory Question Bank Download
No ratings yet
Ma2262 Probability and Queuing Theory Question Bank Download
4 pages
A World of Art Exam Chapter 1
No ratings yet
A World of Art Exam Chapter 1
7 pages
06 Convex Optimization - MCQs
No ratings yet
06 Convex Optimization - MCQs
5 pages
OptimisationII Notes
100% (1)
OptimisationII Notes
94 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Assignment-1 - Numerical NPTEL
No ratings yet
Assignment-1 - Numerical NPTEL
4 pages
Functions of Random Variables
No ratings yet
Functions of Random Variables
5 pages
Linear Algebra Interview Questions and Answers - Sanfoundry 2
No ratings yet
Linear Algebra Interview Questions and Answers - Sanfoundry 2
1 page
ML Assignment NPTEL
No ratings yet
ML Assignment NPTEL
25 pages
Weekly Topical Test 1 Trigonometry
No ratings yet
Weekly Topical Test 1 Trigonometry
3 pages
RDBMS Unit2
No ratings yet
RDBMS Unit2
28 pages
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
No ratings yet
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
3 pages