0% found this document useful (0 votes)

12 views

LVM Class 5

Uploaded by

chu.xujohn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

LVM Class 5

Uploaded by

chu.xujohn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

Latent Variable Methods Course

Learning from data

Instructor: Kevin Dunn

[email protected]
http://connectmv.com

© Kevin Dunn, ConnectMV, Inc. 2011

Revision: 268:adfd compiled on 15-12-2011

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0

Unported License. To view a copy of this license, please visit
http://creativecommons.org/licenses/by-sa/3.0/

This license allows you:

I to share - to copy, distribute and transmit the work
I to adapt - but you must distribute the new result under the
same or similar license to this one
I commercialize - you are allowed to create commercial
applications based on this work
I attribution - you must attribute the work as follows:
I “Portions of this work are the copyright of ConnectMV”, or
I “This work is the copyright of ConnectMV”

© ConnectMV, 2011 2
We appreciate:
I if you let us know about any errors in the slides
I any suggestions to improve the notes
I telling us if you use the slides, especially commercially, so we
can inform you of major updates
I emailing us to ask about different licensing terms

All of the above can be done by writing us at

[email protected]

If reporting errors/updates, please quote the current revision number: 268:adfd

How do I know a point is an outlier?

I Easier if it’s your own data
I Which plots should I use to detect outliers?
I What a 95% limit means ...
I Always confirm your conclusions from the raw data
I Still have to use your head!
Activating the software
I Please email your codes to: [email protected]

Resume from last class: slides 28 to 32

Unfortunately, I’ve added some more details, and rearranged the

slides

I Ti2 is a summary of all A components within row i

a=A
X ti,a 2
I 2
Ti =
sa
a=1
sa = standard deviation of score column a
©
I
ConnectMV, 2011 7
Hotelling’s T 2
a=A
X 2
ti,a
I T2 =
i
sa
a=1

I s1 > s2 > . . . (from the eigenvalue derivation)

I Ti2 ≥ 0
I Plotted as a time-series/sequence plot
I Useful if the row order in dataset has a meaning

a=A
X 2
ti,a
Ti2 = ≥0
sa
a=1

I Interpretation: directed
distance from the center to
where the point is projected
on the plane
I T 2 has an F -distribution
I Often show the 95%
confidence limit value, called
2
TA,α=0.05

© ConnectMV, 2011 9
Hotelling’s T 2
2 t12 t22
I If A = 2, equation for 95% limit = TA=2,α=0.05 = +
s12 s22
I An equation for an ellipse
I s1 and s2 are constant for a given model
I Points on ellipse have a constant distance from model center

I Hotelling’s T 2 = distance of every point from center, taking

(co)variance into account
a=A
X ti,a 2
I Why not use a Euclidean distance Ti =2
1
a=1

I Instead we use the Mahalanobis distance:

a=A
X ti,a 2
2
Ti = ≥0
sa
a=1

The green point is equidistant The same red point however is

from the center, but doesn’t “equally far” from the model
accurately reflect “outlyingness” center, at all points on the ellipse
Inspiration for left image is due to Rasmus Bro’s video:

http://www.youtube.com/watch?v=ExoAbXPJ7NQ

Resume from last class: slides 60 to 66

Unfortunately, I’ve added some more details, and rearranged the

slides

I Interrogate the latent variables to see what changed

I Shows difference between two points in the score plot

Example:
I 207: temperature on tray 129
in distillation column 3
I 158: a tag from distillation
column 3
I 33 and 277: related to
concentration of feed A

I These variables are related to the problem

I Not the cause of the problem
I Still have to use your engineering judgement to diagnose
I But, we’ve reduced the size of the problem

©ConnectMV, 2011 15
Contributions in the score space: one PC

©
ConnectMV, 2011 16
Contributions in the score space: one PC
Score = ti,a = xi pa = linear combination

I xi,1 p1,a xi,2 p2,a . . . xi,K pK ,a ←− there are K terms

I relative size of terms is

interpreted
I most often shown as a bar
plot
I absolute value on y -axis is
never used/not shown
I not sensible to interpret
contributions for observation
with a small score
I example here has K = 6
I signs can be interpreted, but
rather verify in raw data
© ConnectMV, 2011 17
Contributions in more than 1 score

Summation of the contributions from each score, weighted by the

size of the score.

Consider PC1 and PC2 for variable k:

I contribution in t1 direction = xi,k pk,1
I contribution in t2 direction = xi,k pk,2
ti,1 ti,2
I joint contribution = xi,k pk,1 · + xi,k pk,2 ·
s1 s2

In general: joint contribution for variable xk =

v
ti,a 2
u
uX
contrib(xk ) = xi,k t pk,a ·
a
sa

Not uniform in various software:

I Cleanest: use the weighted sum of score contributions, as
shown before.
I Alvarez et al. - paper 21
I Kourti and MacGregor - paper 81
I Mason, Tracy and Young: “Decomposition of T 2 for
multivariate control chart interpretation”, Journal of Quality
Technology, 27, 99-108, 1995.

©
ConnectMV, 2011 22
Contributions: modifying the starting point
We can modify the starting point, not necessary to use origin:
(to) (to)
I ti,a = xi pa
(from) (from) (from)
I t
i,a = xi pa ←− usually the origin: ti,a =0

Subtract:

(to) (from) (to) (from)
ti,a − ti,a = xi − xi pa
∆ti,a = ∆xi pa ←− plot as bar plot

In general:
v  2
u
(to) (from)
u t − t
(to) (from) u pk,a · i,a i,a
X
contrib(xk ) = xi,k − xi,k t 
a
s a

SPE = e0i ei where e0i = x0i − b x0i

I (xi,1 − x̂i,1 ) (xi,2 − x̂i,2 ) . . . (xi,K − x̂i,K ) ←− bar plot
I Could show squared values: (xi,k − x̂i,k )2 for variable k
I But sometimes +ve and −ve patterns in the bars are helpful
to identify the fault signature
I See work of Yoon and MacGregor on fault signatures
I Don’t interpret absolute value of the error bars
I Don’t interpret contributions for observations with small SPE
I Large bar: doesn’t always mean that variable is a problem
(example on board)

©ConnectMV, 2011 24
Contribution plots: T 2 and SPE

Joint T 2 and SPE monitoring plots

I Illustrated on the board

You might see the concept of “leverage” in software packages:

Each observation has leverage on the mode
Leveragei = diag T(T0 T)−1 T0

(i,i)
>0

I (T0 T) =
I Leveragei = scaled down version of Ti2
i=N
X
I Leveragei = A = the number of columns in T
i=1
A
I Cut off for Leveragei = 3 ·
N
I Points with Leveragei > cut off have large influence on model

A
Cut off = 3 · = 3 × 3/184
N
©
ConnectMV, 2011 27
Variable importance to prediction
Characteristics of variables that have important role in model?
I Have large (absolute) weights: why?
I Come from a component that has a high R 2

Combining these two concepts we calculate for each variable:

Importance of variable k using A components
A
2 K X
2
VIPA,k = · (SSXa−1 − SSXa ) Pa,k
SSX0 − SSXA
a=1

I SSXa = sum of squares in the X matrix after a components

SSXa−1 −SSXa
I
SSXA = incremental R 2 for ath component
I SSX0 −SSXA = R 2 for model using A components
SSXA
2
P
I Messy, but you can show that
k VIPA,k = K
I Reasonable cut-off =
© ConnectMV, 2011 28
Variable importance to prediction

©
ConnectMV, 2011 29
Jackknifing
We re-calculate the model G + 1 times during cross-validation:
I G times, once per group
I The “+1” is from the final round, where we use all
observations

We get G + 1 estimates of the PCA model parameters:

I loadings
I VIP values
for every variable (1, 2, . . . K ).

Can now calculate confidence intervals (caution with CI on

loadings)

I Martens and Martens (paper 43) describing jackknifing.

I Efron and Tibshirani describe the bootstrap and jackknife.
© ConnectMV, 2011 30
Case studies

I Raw material characterization

I Near infra-red spectra of tablets

©
ConnectMV, 2011 31
Wafer case study
I Data source: Silicon wafer thickness
I Nine thickness measurements from a silicon wafer.
I Thickness measured at the nine locations

1. Build a PCA model on the data on the first 100 rows.

2. Plot the scores. What do you notice?
3. Investigate the outliers with the contribution tool.
4. Verify that the outliers exist in the raw data
5. Exclude any unusual observations and refit the model
6. Did you get all the outliers? Check the scores and SPE.
Repeat to get all outliers removed.
7. Plot a loadings plot for the first component. What is your
interpretation of p1 ?
8. Given the R 2 and Q 2 values for the first component, what is
your interpretation about the variability in this process?
(Remember the goal of PCA is to explain variability)

©
ConnectMV, 2011 33
Wafer case study II
9. What is the interpretation of p2 ? From a quality control
perspective, if you could remove the variability due to p2 , how
much of the variability would you be removing from the
process?
10. Plot the corresponding time series plot for t1 . What do you
notice in the sequence of score values?
11. Repeat the above question for the second component.
12. Use all the data as testing data (184 observations, of which
the first ≈ 100 were used to build the model).
13. Do the outliers that you excluded earlier show up as outliers
still? Do the contribution plots for these outliers give the
same diagnosis that you got before?
14. Are there any new outliers in points 101 to 184? If so, what
are is their diagnosis?

I You have an intuitive (built-in) model for your body

I When everything is normal: we say “I’m healthy” (in control)
I Detect a problem: pain, lack of mobility, hard to breath
I Something feels wrong (there’s a special cause)
I Diagnose the problem: yourself, search internet, doctor
I Fix the problem and get back to your usual healthy state

Where did that intuitive model for your body’s health come from?

Assume the doctor is always right and that the baseline hypothesis
is: “you are healthy”
I Type 1 error: you detect a problem (e.g. hard to breathe);
doctor says nothing is wrong
I You’ve raised a false alarm
I You feel outside your limits,
I but the truth is: “you are healthy”
I Type 1 error = raise an alarm when there isn’t a problem

Assume the doctor is always right and that the baseline hypothesis
is: “you are healthy”
I Type 2 error: you feel OK; but go to doctors for physical and
they detect a problem
I You feel within your limits,
I but the truth is: “you are not healthy”
I Type 2 error = don’t raise an alarm when there is a problem
I The grid

Our goal: We want process stability

Best case: we have unaccounted sources of noise: called error

© ConnectMV, 2011 40
Variability
More realistically:
I Sensor drift, spikes, noise, recalibration shifts, errors in our
sample analysis
I Operating staff: introduce variability into a process
I Raw material properties are not constant
I External conditions change (ambient temperature, humidity)
I Equipment breaks down, wears out, sensor drift, maintenance
shut downs
I Feedback control introduces variability

Assertion
Customers expect both uniformity and low cost when they buy
your product. Variability defeats both objectives.

Remind yourself of the last time you bought something that didn’t
work properly

The high cost of variability in your final product:

1. Inspection costs:
I high variability: test every product (expensive, inefficient,
sometimes destructive)
I low variability: limited inspection required
2. Off-specification products cost you, and customer, money:
I reworked
I disposed
I sold at a loss

© ConnectMV, 2011 43
The high cost of variability in your raw materials
I Flip it around: you receive highly variable raw materials:
I That variability lands up in your product, or
I you incur additional cost (energy/time/materials) to process it

1. rapid problem detection

2. diagnose the problem
3. finally, adjust the process so problems don’t occur

Process monitoring is mostly reactive and not proactive. So it is

suited to incremental process improvement

I “Process monitoring” also called “Statistical Process Control”

(SPC)
I We will avoid this term due to potential confusion:
I Monitoring is similar to (feedback) control:
I continually applied
I checks for deviations (error)
I Monitoring is different to (feedback) control:
I adjustments are infrequent
I usually manual
I adjust due to special causes

I Process monitoring: make permanent adjustments to reduce

variability
I Feedback control: temporarily compensates for the problem

Monitoring is widely used in all industries

I Managers: monitor geographic regions for hourly sales,
downtime, throughput
I Engineers: monitor large plants, subsections, and unit
operations

Tools/buzzwords used go by names such as:

I Dashboards
I Analytics
I BI: business intelligence,
I KPI: key performance indicators

© ConnectMV, 2011 47
Shewhart chart (recap)
I Named for Walter Shewhart from Bell Telephone and Western
Electric, parts manufacturing, 1920’s
I A chart for monitoring variable’s location, shown with
I a lower control limit (LCL), usually at +3σ
I a upper control limit (UCL), usually at −3σ
I a target, at the setpoint/desired value

No action taken as long as the variable plotted remains within

limits (in-control). Why?

I Type I error:
I value plotted is from common-cause operation, but falls
outside limits
I if values are normally distributed, how many will fall outside?
I ±2σ limits?
I ±3σ limits?
I Synonyms: false alarm, producer’s risk

I Type II error:
I value plotted is from abnormal operation, but falls inside limits
I Synonyms: false negative, consumer’s risk

Key point
Control chart limits are not set in stone. Adjust them!

Nothing makes a control chart more useless to operators than

frequent false alarms.

I But, you cannot simultaneously have low type I and type

II error

1. What action is taken when outside the limits

2. What if data goes missing?

3. Monitoring many variables.

Lab measurements have a long time delay:

I process already shifted by the time lab values detect a
problem (continuous)
I batches have to placed on hold until lab results return
I very hard to find cause-and-effect for diagnosis
I e.g. low product strength could be caused by multiple reasons

Measurements from real-time systems are:

I available more frequently (less delay) than lab measurements
I often are more precise, often with lower error
I more meaningful to the operating staff
I contains almost unique “fingerprint” of problem (helps
diagnosis)
I Now we can figure out what caused low product strength

“Variables” monitored don’t need to be from on-line sensors: could

be a calculated value

Monitoring with latent variables; use:

I scores from the model, t1 , t2 , . . . , tA

Illustration on the board

Much better than the raw variables:

I The scores are orthogonal (independent)
I Far fewer scores than original variables
I Calculated even if there are missing data
I Can be monitored anywhere there is real-time data
I Available before the lab’s final measurement

a=A
X 2
ta
Hotelling’s T2 =
sa
a=1
I The distance along the model plane
I Is a one-side monitoring plot
I What does a large T 2 value mean?

©
ConnectMV, 2011 58
Process monitoring with PCA: SPE
SPEi = (xi − x̂i )0 (xi − x̂i ) = e0i ei
I Distance off the model plane
I Is a one-side monitoring plot
I What does a large SPE value mean?

I Interrogate the latent variables to see what changed

I Shows difference between two points in the score plot

Example:
I 207: temperature on tray 129
in distillation column 3
I 158: a tag from distillation
column 3
I 33 and 277: related to
concentration of feed A

I These variables are related to the problem

I Not the cause of the problem
I Still have to use your engineering judgement to diagnose
I But, we’ve reduced the size of the problem

I Scores: ti,a = xi pa

I xi,1 p1,a xi,2 p2,a . . . xi,k pk,a . . . xi,K pK ,a
I Derivation on the board

I T 2 contributions: weighted sum of scores

I More details in Alvarez et al. - paper 21
I and Kourti and MacGregor - paper 81

I SPE = e0i ei
I where
e0i = x0i − b
x0i
I (xi,1 − x̂i,1 ) (xi,2 − x̂i,2 ) . . . (xi,K − x̂i,K )

I Joint T 2 and SPE monitoring plots

I Illustrated on the board
I Discussion

I ArcelorMittal in Hamilton (formerly called Dofasco) has used

multivariate process monitoring tools since 1990’s
I Over 100 applications used daily
I Most well known is their casting monitoring application,
Caster SOS (Stable Operation Supervisor)
I It is a multivariate monitoring system

All screenshots with permission of Dr. John MacGregor

I Stability Index 1 and 2: one-sided monitoring chart

I Warning limits and the action limits.
I A two-sided chart in the middle
I Lots of other operator-relevant information

Updated based on operator feedback/requests

I Implemented system in 1997; multiple upgrades since then

I Economic savings: more than $ 1 million/year
I each breakout costs around $200,000 to $500,000
I process shutdowns and/or equipment damage

Show video

I Hotelling’s T 2 is called “stability indicator” for operators

I Horizontal red line is the 99% limit
I Shaded green area is the 0 to 95% limit region
© ConnectMV, 2011 78
Monitoring isn’t just for chemical processes

Any data stream can be monitored

I Raw material characteristics
I On-line data from systems (most common multivariate
monitoring)
I Final quality properties
I End-point detection
I More generally: any row in a data matrix
I Credit card/financial fraud monitoring
I Human resources

©ConnectMV, 2011 79
General procedure to build monitoring models I
1. Identify variable(s) to monitor.
2. Retrieve historical data (computer systems, or lab data, or
paper records)
3. Import data and just plot it.
I Any time trends, outliers, spikes, missing data gaps?
4. Locate regions of stable, common-cause operation.
I Remove spikes and outliers
5. Building monitoring model
6. Model includes control limits (UCL, LCL) for scores, SPE and
Hotelling’s T 2
7. Test your chart on new, unused data.
I Testing data: should contain both common and special cause
operation
8. How does your chart work?
I Quantify the type I and II error.
©
ConnectMV, 2011 80
General procedure to build monitoring models II
I Adjust the limits;
I Repeat this step, as needed to achieve levels of error
9. Run chart on your desktop computer for a couple of days
I Confirm unusual events with operators; would they have
reacted to it? False alarm?
I Refine your limits
10. Not an expert system - will not diagnose problems:
I use your engineering judgement; look at patterns; knowledge
of other process events
11. Demonstrate to your colleagues and manager
I But go with dollar values
12. Installation and operator training will take time
13. Listen to your operators
I make plots interactive - click on unusual point, it drills-down
to give more context

I Getting the data out

I Real-time use of the data (value of data decays exponentially)
I Training people to use the monitoring system is time
consuming
I Bandwidth/network/storage/computing

These papers will help you get to the bottom of process

monitoring:
I MacGregor: Using on-line process data to improve quality:
challenges for statisticians (paper 75)
I Kourti and MacGregor: Process analysis, monitoring and
diagnosis, using multivariate projection methods (paper 31)
I MacGregor and Kourti: Statistical process control of
multivariate processes (paper 16)
I Kresta, MacGregor and Marlin: Multivariate statistical
monitoring of process operating performance (paper 9)
I Miller et al.: Contribution plots: a missing link in multivariate
quality control (paper 78)

All Models Are Wrong
No ratings yet
All Models Are Wrong
429 pages
Waves Worksheet
No ratings yet
Waves Worksheet
2 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Great Tales From English History
100% (2)
Great Tales From English History
273 pages
Empirical Finance8
No ratings yet
Empirical Finance8
11 pages
Exercise 1 Instruction Pca
No ratings yet
Exercise 1 Instruction Pca
9 pages
Dsur I Chapter 17 Efa
No ratings yet
Dsur I Chapter 17 Efa
47 pages
STATISTIC%20AND%20DATA%20SCIENCE%20II.pdf
No ratings yet
STATISTIC%20AND%20DATA%20SCIENCE%20II.pdf
37 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Aditya Garg DMDW
No ratings yet
Aditya Garg DMDW
40 pages
Stats101A - Chapter 3
No ratings yet
Stats101A - Chapter 3
54 pages
ML PPT 2
No ratings yet
ML PPT 2
206 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
8 pages
11668a5f867641748200d0bfd6a889a3_hst951_7
No ratings yet
11668a5f867641748200d0bfd6a889a3_hst951_7
32 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Preprocessing - M2
No ratings yet
Preprocessing - M2
53 pages
ML (1)
No ratings yet
ML (1)
6 pages
6 Dimension Reduction Theory
No ratings yet
6 Dimension Reduction Theory
18 pages
Feature Extraction
No ratings yet
Feature Extraction
90 pages
Variable Selection
No ratings yet
Variable Selection
26 pages
O4MD 01 Introduction
No ratings yet
O4MD 01 Introduction
10 pages
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
No ratings yet
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
191 pages
Variable Selection: Prof. Sharyn O'Halloran Sustainable Development U9611 Econometrics II
No ratings yet
Variable Selection: Prof. Sharyn O'Halloran Sustainable Development U9611 Econometrics II
79 pages
Variable Selection: Prof. Sharyn O'Halloran Sustainable Development U9611 Econometrics II
No ratings yet
Variable Selection: Prof. Sharyn O'Halloran Sustainable Development U9611 Econometrics II
79 pages
Unit 1
No ratings yet
Unit 1
38 pages
Week 9 Lecture - Revision Test-dual-translated
No ratings yet
Week 9 Lecture - Revision Test-dual-translated
92 pages
Exegeses ANOVA III
No ratings yet
Exegeses ANOVA III
26 pages
Chapter 1 Introduction To Data Mining
No ratings yet
Chapter 1 Introduction To Data Mining
10 pages
Multivariate
100% (1)
Multivariate
78 pages
Joining Instructions Lisboa
No ratings yet
Joining Instructions Lisboa
8 pages
Course1 Review
No ratings yet
Course1 Review
45 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
Sparse Inverse Covariance Selection Presentation
No ratings yet
Sparse Inverse Covariance Selection Presentation
14 pages
JMR
No ratings yet
JMR
4 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Lecture 15_23.09.2024_ Feature Selection
No ratings yet
Lecture 15_23.09.2024_ Feature Selection
47 pages
CH 2
No ratings yet
CH 2
31 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Building Diversified Portfolios That Outperform Out-Of-Sample
No ratings yet
Building Diversified Portfolios That Outperform Out-Of-Sample
33 pages
Why's and Wherefore's
No ratings yet
Why's and Wherefore's
15 pages
data screening and main model analysis in spss
No ratings yet
data screening and main model analysis in spss
26 pages
ADS Ut2
No ratings yet
ADS Ut2
23 pages
STAT359 Study Guide
No ratings yet
STAT359 Study Guide
7 pages
Data Screening Assumptions
No ratings yet
Data Screening Assumptions
29 pages
Data Science Project - Flow Graph
No ratings yet
Data Science Project - Flow Graph
7 pages
Additional Cheatsheet En
No ratings yet
Additional Cheatsheet En
2 pages
Business Forecast Vishay Sood
No ratings yet
Business Forecast Vishay Sood
8 pages
Multivariant Data.
No ratings yet
Multivariant Data.
36 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
Multicollinearity and Remedies
No ratings yet
Multicollinearity and Remedies
23 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
BRM Assignment
No ratings yet
BRM Assignment
26 pages
20 Cs 112
No ratings yet
20 Cs 112
11 pages
Econometrics ( PDFDrive )
No ratings yet
Econometrics ( PDFDrive )
307 pages
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
No ratings yet
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
48 pages
Regression Assumptions Explained
No ratings yet
Regression Assumptions Explained
6 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
New Style General Presentation 1
No ratings yet
New Style General Presentation 1
25 pages
ASPRE
No ratings yet
ASPRE
5 pages
Rizwan SB Sales Report
No ratings yet
Rizwan SB Sales Report
1 page
Service Impacting Alarm List
No ratings yet
Service Impacting Alarm List
3 pages
Ort Infact Seasons Tns
No ratings yet
Ort Infact Seasons Tns
5 pages
Chapter 1 To 3 Revised
No ratings yet
Chapter 1 To 3 Revised
25 pages
The Barn at Firestone Farms Brochure
No ratings yet
The Barn at Firestone Farms Brochure
8 pages
TC1301A/B: Dual LDO With Microcontroller RESET Function
No ratings yet
TC1301A/B: Dual LDO With Microcontroller RESET Function
28 pages
Inspection Checksheet for Cold Coiled Helical Spring
No ratings yet
Inspection Checksheet for Cold Coiled Helical Spring
9 pages
... BIKSHAM - 'Million Revolts' in The Making
No ratings yet
... BIKSHAM - 'Million Revolts' in The Making
6 pages
Primary & Backup Generator Specification and Testing: Eeb/Esd
No ratings yet
Primary & Backup Generator Specification and Testing: Eeb/Esd
16 pages
Cipla Company Profile
100% (2)
Cipla Company Profile
17 pages
Wps 08-04-2015 - 3
100% (1)
Wps 08-04-2015 - 3
6 pages
Introduction to Elementary Particle Physics 1st Edition Alessandro Bettini - Download the full ebook set with all chapters in PDF format
100% (1)
Introduction to Elementary Particle Physics 1st Edition Alessandro Bettini - Download the full ebook set with all chapters in PDF format
47 pages
COMPREHENSIVE URBAN DEVELOPMENT PLAN FOR GREATER KUMASI Draft Final Report Vol.3
No ratings yet
COMPREHENSIVE URBAN DEVELOPMENT PLAN FOR GREATER KUMASI Draft Final Report Vol.3
140 pages
Moxee Mobile Hotspot K779HSDL User Manual - Manuals+
No ratings yet
Moxee Mobile Hotspot K779HSDL User Manual - Manuals+
15 pages
Menu December by The Bay
No ratings yet
Menu December by The Bay
27 pages
Walton Report
100% (1)
Walton Report
16 pages
تقرير جاهز عن القلب
No ratings yet
تقرير جاهز عن القلب
4 pages
PV Elite 2008
No ratings yet
PV Elite 2008
3 pages
The Cirata Floating Solar Photovoltaic Power Plant Project: Comments or Notes
No ratings yet
The Cirata Floating Solar Photovoltaic Power Plant Project: Comments or Notes
2 pages
Backyard Chicken Basics PDF
No ratings yet
Backyard Chicken Basics PDF
4 pages
Diagrama Da Injeção SUZUKI VITARA 2.0 16V 2000
No ratings yet
Diagrama Da Injeção SUZUKI VITARA 2.0 16V 2000
3 pages
D4373 Determinacion Rapida de Carbonato de Calcio
No ratings yet
D4373 Determinacion Rapida de Carbonato de Calcio
5 pages
SSPC SP 2 PDF
No ratings yet
SSPC SP 2 PDF
2 pages
Mitsubishi USA PCMUpdates
100% (1)
Mitsubishi USA PCMUpdates
51 pages
IRIS Touch 4xxng Series Quick Installation Maintenance Guide 09 - 12 - 14 1 1
No ratings yet
IRIS Touch 4xxng Series Quick Installation Maintenance Guide 09 - 12 - 14 1 1
16 pages
Tle Dressmaking9 q2 m3
No ratings yet
Tle Dressmaking9 q2 m3
9 pages

LVM Class 5

Uploaded by

LVM Class 5

Uploaded by

Latent Variable Methods Course

Learning from data

Instructor: Kevin Dunn

© Kevin Dunn, ConnectMV, Inc. 2011

Revision: 268:adfd compiled on 15-12-2011

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0

This license allows you:

All of the above can be done by writing us at

If reporting errors/updates, please quote the current revision number: 268:adfd

How do I know a point is an outlier?

Resume from last class: slides 28 to 32

Unfortunately, I’ve added some more details, and rearranged the

I Ti2 is a summary of all A components within row i

I s1 > s2 > . . . (from the eigenvalue derivation)

I Hotelling’s T 2 = distance of every point from center, taking

I Instead we use the Mahalanobis distance:

The green point is equidistant The same red point however is

Resume from last class: slides 60 to 66

Unfortunately, I’ve added some more details, and rearranged the

I Interrogate the latent variables to see what changed

I Shows difference between two points in the score plot

I These variables are related to the problem

I relative size of terms is

Summation of the contributions from each score, weighted by the

Consider PC1 and PC2 for variable k:

In general: joint contribution for variable xk =

Not uniform in various software:

SPE = e0i ei where e0i = x0i − b x0i

Joint T 2 and SPE monitoring plots

You might see the concept of “leverage” in software packages:

Combining these two concepts we calculate for each variable:

I SSXa = sum of squares in the X matrix after a components

We get G + 1 estimates of the PCA model parameters:

Can now calculate confidence intervals (caution with CI on

I Martens and Martens (paper 43) describing jackknifing.

I Raw material characterization

1. Build a PCA model on the data on the first 100 rows.

I You have an intuitive (built-in) model for your body

Our goal: We want process stability

Best case: we have unaccounted sources of noise: called error

The high cost of variability in your final product:

1. rapid problem detection

Process monitoring is mostly reactive and not proactive. So it is

I “Process monitoring” also called “Statistical Process Control”

I Process monitoring: make permanent adjustments to reduce

Monitoring is widely used in all industries

Tools/buzzwords used go by names such as:

No action taken as long as the variable plotted remains within

Nothing makes a control chart more useless to operators than

I But, you cannot simultaneously have low type I and type

1. What action is taken when outside the limits

3. Monitoring many variables.

Lab measurements have a long time delay:

Measurements from real-time systems are:

“Variables” monitored don’t need to be from on-line sensors: could

Monitoring with latent variables; use:

Illustration on the board

Much better than the raw variables:

I Interrogate the latent variables to see what changed

I Shows difference between two points in the score plot

I These variables are related to the problem

I T 2 contributions: weighted sum of scores

I Joint T 2 and SPE monitoring plots

I ArcelorMittal in Hamilton (formerly called Dofasco) has used

All screenshots with permission of Dr. John MacGregor

I Stability Index 1 and 2: one-sided monitoring chart

Updated based on operator feedback/requests

I Implemented system in 1997; multiple upgrades since then

I Hotelling’s T 2 is called “stability indicator” for operators

Any data stream can be monitored

I Getting the data out

These papers will help you get to the bottom of process

You might also like