0% found this document useful (0 votes)
8 views47 pages

ECEN615 Fall2022 Lect16-1

Uploaded by

engrkam257
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views47 pages

ECEN615 Fall2022 Lect16-1

Uploaded by

engrkam257
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

ECEN 615

Methods of Electric Power


Systems Analysis
Lecture 16: Least Squares, Singular Value
Decomposition (SVD), and State Estimation

Prof. Tom Overbye


Dept. of Electrical and Computer Engineering
Texas A&M University
[email protected]
Announcements
• Read Chapter 9
• Homework 5 is due on Tuesday Nov 1, 2022

2
Forward and Inverse Problems
• In science and engineering analysis we are often dealing with two classes
of problems
– Forward or direct problems; we’re using a model with inputs to determine a set of
outputs; power flow is an example of a forward problem
– Inverse problems are the opposite: we’re using a set of outputs, perhaps coupled
with a model, to determine a set of inputs; power system state estimation is an
example of an inverse problem
• Both forward and inverse problems can be linear or nonlinear
• Inverse problems can present many challenges including whether there are
enough observations of sufficient quality to obtain an answer

3
Least Squares
• So far we have considered the solution of Ax = b in which A is a square
matrix; as long as A is nonsingular there is a single solution
– That is, we have the same number of equations (m) as unknowns (n)
– This is a forward problem
• Many problems are overdetermined in which there more equations than
unknowns (m > n)
– Overdetermined systems are usually inconsistent, in which no value of x exactly
solves all the equations
• Underdetermined systems have more unknowns than equations (m < n);
they never have a unique solution but are usually consistent

4
Method of Least Squares
• The least squares method is a solution approach for determining an
approximate solution for an overdetermined system
• If the system is inconsistent, then not all of the equations can be
exactly satisfied
• The difference for each equation between its exact solution and the
estimated solution is known as the error
• Least squares seeks to minimize the sum of the squares of the errors
• Weighted least squares allows differ weights for the equations

5
Least Squares Solution History
• The method of least squares developed from trying to estimate actual
values from a number of measurements
• Several persons in the 1700's, starting with Roger Cotes in 1722,
presented methods for trying to decrease model errors from using
multiple measurements
• Legendre presented a formal description of the method in 1805;
evidently Gauss claimed he did it in 1795
• Method is widely used in power systems, with state estimation the best
known application, dating from Fred Schweppe's work around 1970
– Fred also did a lot of work associated with locational marginal prices (LMPs)
– He was a professor at MIT who died in 1988 at age 54
6
Initial State Estimation Paper

It was a three
part paper with
part two
focused on an
approximate
model and part
three on
implementation

7
Least Squares and Sparsity
• In many contexts least squares is applied to problems that are not
sparse. For example, using a number of measurements to optimally
determine a few values
– Regression analysis is a common example, in which a line or other curve is fit to
potentially many points)
– Each measurement impacts each model value
• In the classic power system application of state estimation the system is
sparse, with measurements only directly influencing a few states
– Power system analysis classes have tended to focus on solution methods aimed at
sparse systems; we'll consider both sparse and nonsparse solution methods

8
Least Squares Problem
• Consider Ax  b A ¡ mn
, x  ¡ n , b ¡ m

or

 (a 1 ) T   a 11 a 12 a 13  a 1n   x 1   b1 
 2 T     
(a )  a 21 a 22 a 22  a 2n   x 2   b 2 
  x =
          
     
 m T
 (a )  a  a mn   x n   b m 
 m1 a m 2 a m3

9
Least Squares Solution
• We write (ai)T for the row i of A and ai is a column vector
• Here, m ≥ n and the solution we are seeking is that which minimizes
Ax - bp, where p denotes some norm
• Since usually an overdetermined system has no exact solution, the best
we can do is determine an x that minimizes the desired norm.

10
Choice of p
• We discuss the choice of p in terms of a specific example
• Consider the equation Ax = b with
 1 b 
   1
A =  1 b =  b2  with b 1  b 2  b 3  0
   
 1  
   b 3 

(hence three equations and one unknown)


• We consider three possible choices for p:

11
Choice of p

(i) p = 1
Ax  b 1
is minimized by x * = b2

(ii) p = 2
*
b1 + b 2 + b 3
Ax  b 2
is minimized by x =
3

(iii) p = 

*
b1 + b 3
Ax  b 
is minimized by x =
2 12
The Least Squares Problem
• In general, Ax  b p is not differentiable for p = 1
or p = ∞
• The choice of p = 2 (Euclidean norm) has become well
established given its least-squares fit interpretation
• The problem minn Ax  b is2 tractable for two major
x¡
reasons
– First, the function is differentiable
m 2
1 1  i
=   a  x  b i 
T
 x  =
2
Ax - b
2 2
2 i=1  

13
The Least Squares Problem, cont.
– Second, the Euclidean norm is preserved under orthogonal transformations:

Q A x 
T
Q Tb
2
= Ax  b 2

with Q an arbitrary orthogonal matrix; that is, Q satisfies

QQ T = Q TQ = I Q  ¡ n×n

14
The Least Squares Problem, cont.
• We introduce next the basic underlying assumption: A is full rank, i.e., the
columns of A constitute a set of linearly independent vectors
• This assumption implies that the rank of A is n
because n ≤ m since we are dealing with an overdetermined system
• Fact: The least squares solution x* satisfies

A T A x = A Tb

15
Proof
• Since by definition the least squares solution x* minimizes    at
the optimum, the derivative of this function zero:

1 1 T T
 x  =
2
Ax - b
2
2
=
2
 x A Ax  x T A T b  b T A x + b T b 

  x   1 T T 
 x A Ax  x A b  b A x + b b 
T T T T
0  
x x
 x 2  x

 1 T T 
 
 x 2
 x A A x  2 x T
A T
b + b T
b 
 x

= A T A x   A Tb

16
Implications
• This underlying assumption implies that
A is full rank   x  0  Ax  0
• Therefore, the fact that ATA is positive definite (p.d.) follows from
considering any x ≠ 0 and evaluating
T T 2
x A Ax = Ax 2
> 0,
which is the definition of a p.d. matrix
• We use the shorthand ATA > 0 for ATA being a symmetric, positive
definite matrix

17
Implications
• The underlying assumption that A is full rank and therefore ATA is
p.d. implies that there exists a unique least squares solution
• Note: we use the inverse in a conceptual, rather than a computational,
sense
x = A A 
1
 T
A Tb
• The below formulation is known as the normal equations, with the
solution conceptually straightforward

A T A  x = A Tb

18
Example: Curve Fitting
• Say we wish to fit five points to a polynomial curve of the form
f(t , x)  x1  x2t  x3t 2
Here we are
• This can be written as solving for x

1 t1 t12   y1 
 2  
1 t2 t2   x1   y2 
Ax y  1  
t3 t32   x2   y3 
 2  
1 t4 t4   x3   y4 
1 t5 t52   y5 

19
Example: Curve Fitting
• Say the points are t =[0,1,2,3,4] and y = [0,2,4,5,4]. Then
1 0 0  0
1 1 1   x1   2 

Ax y  1 2 4   x2   4 
   
1 3 9   x3   5 
 1 4 16   4 
 0
 
 0.886 0.257  0.086  0.143 0.086   2 
x = A T A  A T b   0.771 0.186 0.571 0.386  0.371  4 
1

 
 0.143  0.071  0.143  0.071 0.143   5 
 4 
  0.2 
x =  3.1 
  0.5
20
Implications
• An important implication of positive definiteness is that we can factor
ATA since ATA > 0

Α T Α = U T D U = U T D 1/2 D 1/2 U G T G

• The expression ATA = GTG is called the Cholesky factorization of the


symmetric positive definite matrix ATA

21
A Least Squares Solution Algorithm
Step 1: Compute the lower triangular part of ATA
Step 2: Obtain the Cholesky Factorization Α T Α G T G
Step 3: Compute Α T b = bˆ
Step 4: Solve for y using forward substitution in
G T y = bˆ

and for x using backward substitution in

Gx =y

Note, our standard LU factorization approach would work; we can just solve
it twice as fast by taking advantage of it being a symmetric matrix
22
Practical Considerations
• The two key problems that arise in practice with the triangularization
procedure are:
– First, while A maybe sparse, ATA is much less sparse and consequently requires more
computing resources for the solution
• In particular, with ATA second neighbors are now connected! Large networks are still sparse, just
not as sparse
– Second, ATA may actually be numerically less well-conditioned than A

23
Loss of Sparsity Example
• Assume the B matrix for a network is  1 1 0 0 
 
 1 2 1 0 
B =  
 0 1 2 1 
 
 
 0 0 1  1
 2 3 1 0 
 
 3 6  4 1 
• Then BTB is BT B =  
 1  4 6  3
 
 
 0 1  3 2 
• Second neighbors are now connected!
24
Singular Value Decomposition
• Traditionally power system analysis has mostly been focused on the sparse
matrices associated with the electric grid; there was not much signal
analysis
• This is rapidly changing as the power industry get more signals and need
to extract information from them, with PMUs one example
• This data is often presented in the form of a matrix, for example with the
rows being sample points
• A key technique for extracting information from matrices is known as the
singular value decomposition

25
Matrix Singular Value Decomposition (SVD)
• The SVD is a factorization of a matrix that generalizes the
eigendecomposition to any m by n matrix to produce
T
Y UΣV The original concept is more than 100 years old,
but has founds lots of recent applications

where S is a diagonal matrix of the singular values, and U and V are


orthogonal matrices
• The singular values are non-negative real numbers that can be used to
indicate the major components of a matrix (the gist is they provide a way
to decrease the rank of a matrix
• A key application is image compression
26
Aside: SVD Image Compression Example

Images can be represented


with matrices. When an
SVD is applied and only the

largest singular values are


retained the image is
compressed.

Image Source: www.math.utah.edu/~goller/F15_M2270/BradyMathews_SVDImage.pdf 27


An SVD Application, the Pseudoinverse of a Matrix

• The pseudoinverse of a matrix generalizes concept of a matrix


inverse to an m by n matrix, m >= n
– Specifically this is a Moore-Penrose Matrix Inverse
• Notation for the pseudoinverse of A is A+
• Satisfies AA+A = A
• If A is a square matrix, then A+ = A-1
• Quite useful for solving the least squares problem since the least
squares solution of Ax = b is
T
x=A b + A U Σ V
  T
• Can be calculated using an SVD A V Σ U
28
Pseudoinverse Least Squares Matrix Example
• Assume we wish to fit a line (mx + b = y) to three data points: (1,1),
(2,4), (6,4)
• Two unknowns, m and b; hence x = [m b]T
• Setup in form of Ax = b

 1 1  1  1 1
 2 1  m   4  so A =  2 1
   b  
 6 1    4   6 1

29
Aside: Pseudoinverse Least Squares Matrix
Example
• Doing an economy SVD In an economy
SVD the S matrix
  0.182  0.765 has dimensions of
T    6.559 0    0.976  0.219 
A UΣV   0.331  0.543    0.219  0.976  m by m if m < n or
0 0.988
  0.926 0.345    
n by n if n < m
• Computing the pseudoinverse

   0.976 0.219   0.152


T 0    0.182  0.331  0.926 
A V Σ U   0    0.765  0.543 0.345 
  0.219  0.976  1.012  
  T   0.143  0.071 0.214 
A V Σ U  
 0.762 0.548  0.310 

30
Least Squares Matrix Pseudoinverse Example,
cont.
• Computing x = [m b]T gives
 1
   0.143  0.071 0.214     0.429 
A b    4   
 0.762 0.548  0.310   4  1.71 
  Often n is
• With the pseudoinverse approach we immediately see the much less
than m, so
sensitivity of the elements of x to the elements of b the result
– New values of m and b can be readily calculated if y changes tends to
• Computationally the SVD is order mn2+n3 scale linearly
with m
(with n < m)
31
SVD and Principal Component Analysis (PCA)
• The previous image compression example demonstrates PCA, which
reduces dimensionality
– Extracting the principal components
• The principal components are associated with the largest singular values
– This helps to extract the key features of the data and removes redundancy
• PCA is a statistical method for reducing the dimensionality of a dataset
– One example of PCA is facial recognition; another is market research
• PCA is starting to be more widely used in power system analysis,
particularly when doing signal analysis
– In electrical engineering a signal is defined as any time-varying quantity, which
hopefully contains some information
32
Numerical Conditioning
• To understand the point on numerical ill-conditioning, we need to introduce
terminology
• We define the norm of a matrix B  ¡ mn to be

 B x 
B  max  
x 0  x 

= maximum stretching of the matrix B

• This is the maximum singular value of B

33
Numerical Conditioning Example
• Say we have the matrix

 10 0 
B  
 0 0.1

• What value of x with a norm of 1 that maximizes Bx ?


• What value of x with a norm of 1 that minimizes Bx ?
 B x 
B  max  
x 0  x 

= maximum stretching of the matrix B


34
Numerical Conditioning

= max
i
 
 i , i is an eigenvalue of B T B ,
Keep in mind the eigenvalues
i.e., l i is a root of the polynomial of a p.d. matrix are positive

p ( λ) = det  B T B  λI 

• In other words, the 2 norm of B is the square root of the


largest eigenvalue of BTB

35
Numerical Conditioning
• The conditioning number of a matrix B is defined as

 max B   the max / min stretching


 B   B B 1  
 min B   ratio of the matrix B

• A well–conditioned matrix has a small value of  B  close to 1;


the larger the value of  B , the more pronounced the ill-conditioning

36
Power System State Estimation (SE)
• The need is because in power system operations there is a desire to do
“what if” studies based upon the actual “state” of the electric grid
– An example is an online power flow or contingency analysis
• Overall goal of SE is to come up with a power flow model for the present
"state" of the power system based on the actual system measurements
• SE assumes the topology and parameters of the transmission network are
mostly known
• Measurements from SCADA and increasingly PMUs
• Overview is given in ECEN 615; much more details are provided in 614
– Prof Ali Abur has done a lot of work in state estimation; he was at TAMU from 1985
to 2005, and is now at Northeastern University
37
Power System State Estimation
• Problem can be formulated in a nonlinear, weighted least squares
form as
2
m  z  f ( x) 
min J (x)    i i 
i 1  i2
where J(x) is the scalar cost function, x are the state variables
(primarily bus voltage magnitudes and angles), z i are the m
measurements, f(x) relates the states to the measurements and i is
the assumed standard deviation for each measurement

38
Assumed Error
• Hence the goal is to decrease the error between the measurements and the
assumed model states x
• The i term weighs the various measurements, recognizing that they can
have vastly different assumed errors
2
m  z  f ( x) 
min J (x)    i i 
i 1  i2

• Measurement error is assumed Gaussian (whether it is or not is another


question); outliers (bad measurements) are often removed

39
State Estimation for Linear Functions
• First we’ll consider the linear problem. That is where
z meas  f(x) z meas  Hx
• Let R be defined as the diagonal matrix of the variances (square
of the standard deviations) for each of the measurements

  12 0  0 
 
 0 2 2

R
   0
 2
 0  0  m 
40
State Estimation for Linear Functions
• We then differentiate J(x) w.r.t. x to determine the value of x that
minimizes this function
T
J (x)  z  Hx  R  1  z meas  Hx 
meas

J (x)  2HT R  1z meas  2HT R  1Hx


At the minimum we have J ( x) 0. So solving for x gives
1
x  H R H  HT R  1z meas
T 1

41
Simple DC System Example
• Say we have a two bus power system that we are solving using the
dc approximation. Say the line’s per unit reactance is j0.1. Say we
have power measurements at both ends of the line. For simplicity
assume R=I. We would then like to estimate the bus angles. Then
1   2  2  1
z1 2.2, f1 (x)  , z2  2.0, f 2 ( x) 
0.1 0.1
 1   10  10  T  200  200 
x   , H   , H H  

 2   10 10    200 200 

We have a problem since HTH is singular.


This is because of lack of an angle reference.
42
Simple DC System Example, cont.

• Say we directly measure q1 (with a PMU) to be zero; set this as the


third measurement. Then
1   2  2  1
z1 2.2, f (x)  , z2  2.0, f 2 ( x)  , z3 0, f3 ( x) 0
0.1 0.1 Note that
 2.2   10  10  the angles
 1       201  200 
T
x    , z    2  , H    10 10  , H H    are in

 2   200 200  radians
 0   1 0 
1
x   H R H  HT R  1 z meas
T 1

1  2.2 
 201  200   10  10 1     0 
x     10 10 0    2     0.21
  200 200      
 0
43
Nonlinear Formulation
• A regular ac power system is nonlinear, so we need to use an iterative
solution approach. This is similar to the Newton power flow. Here assume
m measurements and n state variables (usually bus voltage magnitudes and
angles) Then the Jacobian is the H matrix

 f1 f1 
 x K xn 
f (x)  1

H ( x)   M O M
x  
 f m K f m 
 x1 xn 
44
Measurement Example
• Assume we measure the real and reactive power flowing into one end of a
transmission line; then the zi-fi(x) functions for these two are

ij  i ij 
i j ij  i j ij i 
P meas    V 2G  V V G cos     B sin    
j 


  B  
ij
 i  ij 2  i j ij i j ij 
Q meas   V 2  B  cap   V V G sin     B cos   
i j 



– Two measurements for four unknowns


• Other measurements, such as the flow at the other end, and voltage
magnitudes, add redundancy
45
SE Iterative Solution Algorithm
• We then make an initial guess of x, x(0) and iterate, calculating Dx each
iteration
This is exactly the least
 z 1  f 1 ( x) 
1   squares form developed
1 1
x  H R H  H R 
T T
  earlier with HTR-1H an n
 z  f ( x)  by n matrix. This could be
 m m  solved with Gaussian
elimination, but this isn't
x ( k 1) x ( k )  x preferred because the
problem is often ill-
Keep in mind that H is no longer constant, but conditioned
varies as x changes, and is often ill-conditioned

46
Nonlinear SE Solution Algorithm,
Book Figure 9.11

47

You might also like