Math Prog
Math Prog
Lecture Notes
CE 385D - McKinney
Water Resources Planning and Management
Department of Civil Engineering
The University of Texas at Austin
Section
page
1. Introduction
3. Constraints
4
4
5
6
5. Types of Solutions
6. Classical Programming
6.1 Unconstrained Scalar Case
6.2 Unconstrained Vector Case
6.3 Constrained Vector Case - Single Constraint
6.5 Constrained Vector Case Multiple Constraints
6.6 Nonlinear Programming and the Kuhn-Tucker Conditions
8
8
9
10
14
16
Exercises
17
References
21
22
22
29
32
Math Programming
1/10/2003
1. Introduction
An optimization problem (mathematical program) is to determine the values of a vector of
decision variables
x1
x = M
xn
(1.1)
(1.2)
The decision variables must also satisfy a set of constraints X which describe the system being
optimized and any restrictions on the decision variables. The decision vector x is feasible if
x is feasible if x X
(1.3)
n
The feasible region X, an n-dimensional subset of R , is defined by the set of all feasible
vectors. An optimal solution x * has the properties (for minimization):
(1) x * X
(1.4)
(1.5)
i.e., the solution is feasible and attains a value of the objective function which is less than or
equal to the objective function value resulting from any other feasible vector.
(2.1)
Math Programming
1/10/2003
maximize the objective function f(x) by choice of the decision variables x while
ensuring that the optimal decision variables satisfy all of the constraints or
restrictions of the problem
The objective function of the math programming problem can be either a linear or nonlinear
function of the decision variables.
Note that:
maximize f(x)
is equivalent to
(2.2)
maximize a + bf(x),
b > 0, or
minimize a + bf(x),
b<0
(2.3)
That is, optimizing if a linear operator; multiplying by a scalar or adding a constant does not
change the result and maximizing a negative is the same as minimizing a positive function.
3. Constraints
The constraint set X of the math program can consist of combinations of:
(1) Linear equalities:
Ax = b
n
aij x j = bi
j =1
(3.1)
i = 1,L m
(3.2)
where the aij , i = 1L m, j = 1L n are the elements of the matrix A, and b is a vector of righthand-side constants. For a review of matrix arithmetic and notation, linear equalities and
inequalities, please see Section 7.1.
(2) Linear inequalities:
Ax b
n
aij x j bi
j =1
Math Programming
(3.3)
i = 1,L m
(3.4)
1/10/2003
g( x) 0
(3.5)
g j ( x) 0
j = 1,L r
(3.6)
where the functions g(x) are nonlinear functions of the decision variables.
(4) Nonlinear equalities:
h( x ) = 0
hi ( x ) 0
(3.7)
i = 1,L m
(3.8)
where the functions h(x) are nonlinear functions of the decision variables.
subject to
Ax b
x0
(4.1)
Example:
Maximize 2 x1 + 3 x 2 + 5 x3
subject to
x1 + x 2 x3 5
6 x1 + 7 x 2 9 x3 = 5
19 x1 7 x 2 + 5 x3 13
Math Programming
1/10/2003
Of course this example has the difficulty of what to do with the absolute value.
An inequality with an absolute value can be replaced by two inequalities, e.g.,
g ( x) b
g ( x) b and g ( x) b
So our example can be converted to:
Maximize 2 x1 + 3 x 2 + 5 x3
subject to
x1 + x 2 x3 5
6 x1 + 7 x 2 9 x3 = 5
19 x1 7 x 2 + 5 x3 13
19 x1 7 x 2 + 5 x3 13
Note: An equation can be replaced by two inequalities of the opposite direction. For example an
equation
g ( x) = b
can be replaced by replaced by
g ( x) b and g ( x) b
Often it is easier for programs to check the inequality condition rather than the strict equality
condition.
(4.2)
subject to
h( x ) = 0
Math Programming
1/10/2003
Example:
Minimize
subject to
x1 2 x 2 = 0
subject to
h( x ) = 0
g( x) 0
(4.3)
Example:
Maximize ln (x 1 + 1) + x 2
subject to
2 x1 + x 2 3
x1 0, x 2 0
5. Types of Solutions
Solutions to the general mathematical programming problem are classified as either global or
local solutions depending upon certain characteristics of the solution. A solution x* is a global
solution (for maximization) if it is:
1. Feasible; and
2. Yields a value of the objective function less than or equal to that obtained by any other
feasible vector, or
x * X , and
f ( x*) f ( x ) x X
(5.1)
Math Programming
1/10/2003
2. Yields a value of the objective function greater than or equal to that obtained by any
feasible vector x sufficiently close to it, or
x * X , and
f ( x*) f ( x ) x ( X x N ( x*))
(5.2)
In this case multiple optima may exist for the math program and we have only established that x*
is the optimum within the neighborhood searched. Extensive investigation of the program to
find additional optima may be necessary.
Sometimes we can establish the local or global nature of the solution to a math program. The
following two theorems give some examples.
Global Max
Local Max
Global Min
Local Min
x
Math Programming
1/10/2003
f(x )
Global Max
6. Classical Programming
Maximize f ( x )
(6.1)
subject to
h( x ) = 0
(6.2)
Maximize f ( x )
x
(6.1.2)
and
d 2 f ( x)
dx 2
0 (second-order conditions)
(6.1.3)
The first-order conditions represent an equation which can be solved for x* the optimal solution
for the problem.
But what if the decision variable was constrained to be greater than or equal to zero (nonnegativity restriction), e.g., x 0 ? In this case there are two possibilities, either (1) the solution
Math Programming
1/10/2003
lies to the right of the origin and has an interior solution where the slope of the function is zero,
or (2) or the solution lies on the boundary (at the origin) where x = 0 and the slope is negative.
That is
df
dx
0 at x = x * if x* = 0
= 0 at x = x * if x* > 0
(6.1.4)
df
df
0, and x
= 0 at x = x *
dx
dx
(6.1.5)
Maximize f ( x )
(6.2.1)
f ( x )
= 0 (first-order conditions)
x
(6.2.3)
f ( x ) = M = 0
f ( x )
x n
(6.2.4)
The first-order conditions represent n - simultaneous equations which can be solved for x* the
optimal solution for the problem.
But what if the decision variable was constrained to be greater than or equal to zero (nonnegativity restriction), e.g., x 0? In this case there are two possibilities, either (1) the solution
lies to the right of the origin and has an interior solution where the slope of the function is zero,
or (2) or the solution lies on the boundary (at the origin) where x = 0 and the slope is negative.
That is
Math Programming
1/10/2003
f
x j
0 at x = x * if x *j = 0
*
= 0 at x = x * if x j > 0
(6.2.5)
(6.2.6)
subject to
h( x ) = 0
(6.3.1)
We can multiply the constraint by a variable or multiplier and subtract the resulting expression
from the objective function to form what is known as the Lagrangian function
L( x, ) = f ( x ) [h( x )]
(6.3.2)
and then simply apply the methods of the previous case (unconstrained vector case). Note that
for a feasible vector the constraint must be satisfied, that is
and
h( x ) = 0
(6.3.3)
L( x, ) = f ( x )
(6.3.4)
so we really have not changed the objective function as long as we remain feasible. The
necessary conditions (first-order) are
h
L { f ( x ) [h( x )]} f
=
=
=0
x
x
x
x
L
= h( x ) = 0
(6.3.5)
Math Programming
10
1/10/2003
h
f
=0
x1
x1
h
f
=0
x 2
x 2
(6.3.6)
h
f
=0
x n
x n
h( x ) = 0
Example (adapted from Loucks et al., 1981, Section 2.6, pp. 23-28)
Consider a situation where there is a total quantity of water R to be allocated to a number of
different uses. Let the quantity of water to be allocated to each use be denoted by xi, i=1,, I.
The objective is to determine the quantity of water to be allocated to each use such that the total
net benefits of all uses is maximized. We will consider an example with three uses I = 3.
S
x1
Reservoir
x3
x2
User 1 B1
User 2
B2
User 3
B3
Bi ( xi ) = ai xi bi xi2
i = 1,2,3
(6.4.1)
where ai and bi are given positive constants. These net-benefit (benefit minus cost) functions are
of the form shown in Figure 6.4.2.
Math Programming
11
1/10/2003
35
30
25
20
B1
15
B2
10
B3
Net
0 economic benefit
to user
0 i
2
10
i =1
x
i =1
+S R =0
(6.4.2)
L( x , ) = (ai xi bi xi2 ) xi + S R
i =1
i =1
(6.4.3)
There are now four unknowns in the problem, xi , i = 1, 2, 3 and . Solution of the problem is
obtained by applying the first-order conditions, setting the first partial derivatives of the
Lagrangian function with respect to each of the variables equal to zero:
Math Programming
12
1/10/2003
L
= a1 2b1 x1 = 0
x1
L
= a 2 2b2 x 2 = 0
x 2
(6.4.4)
L
= a3 2b3 x3 = 0
x3
L
= x1 + x 2 + x3 + S R = 0
These equations are the necessary conditions for a local maximum or minimum ignoring the
nonnegativity conditions. Since the objective function involves the maximization of the sum of
concave functions (functions whose slopes are decreasing), any local optima will also be the
global maxima (by the Local-Global Theorem).
The optimal solution of this problem is found by solving for each xi , i = 1, 2, 3 in terms of .
a
xi = i
2bi
i = 1,2,3
(6.4.5)
xi + S R = 0
i =1
(6.4.6)
3 a
+SR=0
i
i =1 2bi
(6.4.7)
3 a
2 i + S R
i =1 2bi
=
3
1
i =1 bi
(6.4.8)
Hence knowing R, S, ai and bi this last equation can be solved for . Substitution of this value
into the equation for the xi , i = 1, 2, 3 , we can solve for the optimal allocations, provided that all
of the allocations are nonnegative.
Math Programming
13
1/10/2003
subject to
h( x ) = 0
(6.5.1)
L( x, ) = f ( x ) h( x ) = f ( x ) i hi (x)
(6.5.2)
i =1
and then simply apply the methods of the previous case (unconstrained vector case). The
necessary conditions (first-order) are
x L( x, ) = x f ( x ) x h( x ) = 0
(6.5.3)
m
h
L [ f ( x ) h( x )] f
=
=
i i = 0
x
x
x i =1 x
(6.5.4)
L( x, ) = 0
(6.5.5)
h( x ) = 0
(6.5.6)
or
and
or
The first-order conditions (Eq. 6.5.4 and Eq. 6.5.6) represent n+m simultaneous equations must
be solved for the optimal values of the vectors of decision variables and the Lagrange
multipliers, x* and *.
Example (after Haith, 1982, Example 4-2):
Solve the following optimization problem using Lagrange multipliers.
Math Programming
14
1/10/2003
Maximize x12 + 2 x1 x 22
subject to
5 x1 + 2 x 2 10
(6.5.7)
x1 0
x2 0
The last three constraints must be turned into equalities in order to use Classical Programming to
solve the problem. Introduce three new variables, s1, s2, and s3
5 x1 + 2 x2 + s12 = 10
x1 s22 = 0
(6.5.8)
x2 s32 = 0
These slack variables (difference between the left and right sides) are always introduced on the
side of the inequality that the inequality sign points toward.
The Lagrangian function is
L( x, ) = x12 + 2 x1 x22 1 (5 x1 + 2 x2 + s12 10) 2 ( x1 s22 ) 3 ( x2 s32 )
(6.5.9)
L
= 2 x1 + 2 51 2 x1 = 0
x1
L
= 2 x 2 21 x 2 3 x 2
x 2
L
= 21s1 = 0
s1
L
= 22 s 2 = 0
s 2
L
= 23 s3 = 0
s3
L
= 5 x1 + 2 x2 + s12 10
1
L
= x1 s22 = 0
2
L
= x 2 s32 = 0
3
Math Programming
(6.10a)
(6.10b)
(6.10c)
(6.10d)
(6.10e)
(6.10f)
(6.10g)
(6.10h)
15
1/10/2003
Equations 6.10c-e require that i or si be equal to zero. There can be several solutions to the
problem depending on whether one or another of the i or si are equal to zero.
subject to
h( x ) = 0
g( x) 0
(6.6.1)
i =1
j =1
L( x , , u) = f ( x ) i hi(x) u j g j(x)
(6.6.2)
and then simply apply the methods of the previous case (unconstrained vector case). The
necessary conditions (first-order) are the Kuhn-Tucker Conditions
m
r
f
g j
h
i i u j
xk*
= 0
xk i =1 xk j =1 xk
(6.6.3)
g j ( x*) 0
for j = 1,..., r
u j g j ( x*) = 0
(6.6.4)
(6.6.5)
xk* 0, k = 1,..., n
u j 0, j = 1,..., r
(6.6.6)
m
r
g j
h
f
i i u j
0
xk i =1 xk j =1 xk
Math Programming
16
1/10/2003
Exercises
1. (after Mays and Chung, 1992, Exercise 3.4.5) Water is available at supply points 1, 2, and 3
in quantities 4, 8, and 12 thousand units, respectively. All of this water must be shipped to
destinations A, B, C, D, and E, which have requirements of 1, 2, 3, 8, and 10 thousand units,
respectively. The following table gives the cost of shipping one unit of water from the given
supply point to the given destination. Find the shipping schedule which minimizes the total cost
of transportation.
Destination
Source A B C D
1
7 10 5 4
2
3 2 0 9
3
8 13 11 6
E
12
1
14
B
C
D
3
Supply
E
Destination
2. (adapted from Mays and Tung, 1992, Exercise 3.1.1) Solve the following Linear Program
Maximize 2 x1 + 3 x 2 + 5 x3
subject to
x1 + x 2 x3 5
6 x1 + 7 x 2 9 x3 = 5
19 x1 7 x 2 + 5 x3 13
Math Programming
17
1/10/2003
3. (adapted from Mays and Tung, 1992, Exercise 3.2.1) Consider the following Linear Program
Maximize 3 x1 + 5 x 2
subject to
x1 4
x2 6
3x1 + 2 x 2 18
(a) Graph the feasible region for the problem.
(b) Solve the problem graphically.
(c) How much can the nonbinding constraints be reduced without changing the feasibility of
the optimal solution?
(d) What is the range of the objective function coefficient of x2 so that the optimal solution
remains feasible?
4. (after Haith, 1982, Example 5-1) 1000 ha of farmland surrounding a lake is available for two
crops. Each hectare of crop 1 loses 0.9 kg/yr of pesticide to the lake, and the corresponding loss
from crop 2 is 0.5 kg/yr. Total pesticide losses are not allowed to exceed 632.5 kg/yr. Crop
returns are $300 and $150/ha for crops 1 and 2, respectively. Costs for crops are estimated to be
$160 and $50/ha for crops 1 and 2, respectively.
(a) Determine the cropping combination that maximizes farmer profits subject to a constraint
on the pesticide losses into the lake.
(b) If crop returns decrease to $210/ha for crop 1, what is the optimal solution?
(c) If crop returns increase to $380/ha for crop 1, what is the optimal solution?
5. (after Haith, 1982, Exercise 5-1) A metal refining factory has a capacity of 10x 104 kg/week,
produces waste at the rate of 3 kg/kg of product, contined in a wastewater at a concentration of 2
kg/m3. The factorys waste treatment plant operates at a constant efficiency of 0.85 and has a
capacity of 8x104 m3/week. Wastewater is discharged into a river, and the effluent standard is
100,000 kg/week. There is also an effluent charge of $1000/104 kg discharged. Treatment costs
are $1000/104 m3, product sales price is $10,000/104 kg, and production costs are $6850/104 kg.
(a) Construct a linear program that can be used to solve this wastewater problem. Solve the
model graphically.
(b) If the effluent charge is raised to $2000/104 kg, how much will the waste discharge be
reduced?
Math Programming
18
1/10/2003
6. (after Haith, 1982, Exercise 5-9) A standard of 1 kg/103m3 has been set as the maximum
allowable concentration for a substance in a river. Three major dischargers of the substance are
located along the river as shown in the figure. The river has a flow of 500,000 m3/day and an
ambient concentration of the regulated substance of 0.2 kg/103m3 upstream of the first
discharger. The three waste sources presently discharge 100, 100m, and 1600 kg/day of the
regulated substance, resulting in violations of the standard in the river. The substance is not
conserved in the river, but decays at a rate of K=0.03 km-1. Thus is C1 and C2 are the
concentrations of the substance immediately after the discharge points 1 and 2, respectively, the
concentrations at any point L km downstream of discharge 1 (L < 10) is C1e-KL. Similarly, the
concentration L km downstream of discharge 2 (L < 15) is C2e-KL. The cost of removing the
substance from the wastewater is $10X/1000 m3 where X is the fraction of the substance
removed. Use LP to determine an optimal treatment program for the regulated substance.
10 km
300 103 m3/day
0.2 kg/103 m3
15 km
River
1
Flow
Discharge
50 103 m3/day
20 kg/103 m3
3
200 103 m3/day
8 kg/103 m3
7. (after Haith, 1982, Example 4-1) Solve the following optimization problem using Lagrange
multipliers.
Math Programming
19
1/10/2003
9. (after Haith, 1982, Exercise 4-2) Solve the following optimization problem using Lagrange
multipliers (Classical programming)
Maximize 4e x1 x 22
subject to
6 x1 x 2 = 6
x1 0
10. (after Willis, 2002) A waste storage facility consists of a right circular cylinder of radius 5
units and a conical cap. The volume of the storage facility is V. Determine H, the height of the
storage facility, and h, the height of the conical cap, such that the total surface area is minimized.
Math Programming
20
1/10/2003
References
Haith, D.A., Environmental Sytems Optimization, John Wiley & Sons, New York, 1982
Loucks, D. P. et al., Water Resource Systems Planning and Analysis, Prentice Hall, Englewood
Cliffs, 1981
Mays, L.W., and Y-K, Tung, Hydrosystems Engineering and Management, McGraw Hill, 1992
Mathematical Programming and Optimization Texts
Bradley, S.P., Hax, A.C., And Magnanti, T.L., Applied Mathematical Programming, AddisonWesley, Reading, 1977
Fletcher, Practical Methods of Optimization, ...
Gill, P.E., W. Murray, and M.H. Wright, Practical Optimization, Academic Press, London, 1981
Hadley, G., Linear Programing, Addison Wesley, Reading, 1962
Hadley, G., Nonlinear and Dynamic Programming, Addison Wesley, Reading, 1962
Hillier, F. S. and G.J. Lieberman, Introduction to Operation Research, McGraw-Hill, Inc., New
York, 1990.
Intrilligator, M.D., Mathematical Optimizatiron and Economic Theory, Prentice-Hall, Inc.,
Englewood Cliffs, 1971.
Luenberger, D.G., Linear and Nonlinear Programming, Addison Wesley, New York, 1984.
McCormick, G.P., Nonlinear Programming: Theory, Algorithms, and Applications, John Wiley
and Sons, 1983.
Taha, H.A., Operations Research: An Introduction, MacMillan, New York, 1987.
Wagner, H.M., Principles of Operations Research, Prentice-Hall, Inc., Englewood Cliffs, 1975.
Math Programming
21
1/10/2003
An important tool in many areas of scientific and engineering analysis and computation is matrix
theory or linear algebra. A wide variety of problems lead ultimately to the need to solve a linear
system of equations Ax = b. There are two general approaches to the solution of linear systems.
A.1.2 Matrix Notation
rr
A matrix is an array of real numbers. Consider an (m x n) matrix A with m rows and n columns:
a11 a12
a
a22
A = 21
M
a m1 a m 2
a1n
a2n
O
M
a m3 K a mn
a13
a23
(A.1.2.1)
The horizontal elements of the matrix are the rows and the vertical elements are the columns.
The first subscript of an element designates the row, and the second subscript designates the
column. A row matrix (or row vector) is a matrix with one row, i.e., the dimension m = 1. For
example
r = (r1 r2
r3 L rn )
(A.1.2.2)
c1
c = c2
M
cm
(A.1.2.3)
When the row and column dimensions of a matrix are equal (m = n) then the matrix is called
square
a11 a12 L a1n
a
a22
a2n
A = 21
O
M
an1 a n 2 L ann
(A.1.2.4)
22
1/10/2003
a11
a
AT = 12
M
a1n
a 21 L a m1
a22
am 2
a 2n L a mn
(A.1.2.5)
(A.1.2.6)
A diagonal matrix is a square matrix where elements off the main diagonal are all zero
a11
A =
a 22
a nn
(A.1.2.7)
An identity matrix is a diagonal matrix where all the elements are ones
0
1
1
I =
O
0
1
(A.1.2.8)
An upper triangular matrix is one where all the elements below the main diagonal are zero
a11
A =
a12
a 22
L a1n
a2n
a nn
(A.1.2.9)
A lower triangular matrix is one where all the elements above the main diagonal are zero
a11
a
A = 21
M
a
n1
Math Programming
a 22
an2
L a nn
(A.1.2.10)
23
1/10/2003
Two (m x n) matrices A and B are equal if and only if each of their elements are equal. That is
A = B if and only if aij = bij for i = 1,...,m and j = 1,...,n
(A.1.3.1)
The addition of vectors and matrices is allowed whenever the dimensions are the same. The sum
of two (m x 1) column vectors a and b is
a1
b1
a1 + b1
a2
b2
a 2 + b2
a + b = + =
M
M
M
a
b
a + b
m
m
m
m
(A.1.3.2)
Example:
Let u = (1,3,2,4) and v = (3,5,1,2) . Then
u + v = (1 + 3,3 + 5,2 1,4 2) = (4,2,1,2)
a
A + B = 21
M
a
m1
a12
a 22
am 2
a11
a
= 21
a
m1
Math Programming
L a1n
b11
a1n
b21
+
M
O M
b
K a mn
m1
+ b11
+ b21
M
+ bm1
a12 + b12
a 22 + b22
a m 2 + bm 2
24
b12
b22
bm 2
L b1n
b1n
O M
K bmn
a1n + b1n
a1n + b1n
O
M
K a mn + bmn
L
(A.1.3.3)
1/10/2003
a
A = 21
M
a
m1
a12
a 22
a m 2
L a1n
a 2 n
K a mn
(A.1.3.4)
The product of two matrices A and B is defined only if the number of columns of A is equal to
the number of rows of B. If A is (n x p) and B is (p x m), the product is an (n x m) matrix C
a11
a
C = AB = 21
M
a
m1
a12
a 22
am2
a b + L + a 2 n bm1
= 21 11
M
a b +L+ a b
1n m1
m1 11
b11
b21
M
K a mn bm1
L a1n
a1n
O M
b12
b22
bm 2
a11b12 + L + a1n bm 2
a 21b12 + L + a 2 n bm 2
a m1b12 + L + a mn bm 2
K bmn
L b1n
b1n
O M
a 21b1n + L + a 2 n bmn
O
M
K a m1b1n + L + a mn bmn
L
(A.1.3.5)
The ij element of the matrix C is given by
cij =
a
k =1
ik kj
(A.1.3.6)
That is the cij element is obtained by adding the products of the individual elements of the i-th
row of the first matrix by the j-th column of the second matrix (i.e., row-by-column). The
following figure shows an easy way to check if two matrices are compatible for multiplication
and what the dimensions of the resulting matrix will be:
C nxm = Anxp Bpxm
(A.1.3.7)
Example:
b1
b
Let a = (a1 , a 2 ,L, a n ) and b = 2 , then
M
b
n
Math Programming
25
1/10/2003
b1
b
a b = (a1 , a 2 ,L, a n ) 2 = a1b1 + a 2 b2 + L + a n bn
M
b
n
Example:
a11
a
Let A = 21
M
a n1
a12
a 22
an 2
L a1n
b1
L a2n
b
and b = 2 , then
M
M
L a nn
bn
a11
a
Ab = 21
M
a n1
a12
a 22
an 2
Matrix division is not a defined operation. The identity matrix has the property that IA=A and
AI = A. If A is an (n x n) square matrix and there is a matrix X with the property that
AX = I
(A.1.3.8)
where I is the identity matrix, then the matrix X is defined to be the inverse of A and is denoted
A-1. That is
AA-1 = I and A-1A = I
(A.1.3.9)
1
a11 a 22 a12 a 21
a 22
a
21
a12
a11
(A.1.3.10)
Example
2 1
2 1 2 / 3 1 / 3
1
, then A 1 =
=
If A =
2(2) 1(1) 1 2 1 / 3 2 / 3
1 2
26
1/10/2003
Ax = b
(A.1.4.1)
a 21
M
a
n1
a12
a 22
an2
L a1n x1
b1
a 2 n x 2
b
= 2
O
M
M
L a nn x n
bn
(A.1.4.2)
Performing the matrix multiplication and writing each equation out separately, we have
a11 x1 + a12 x 2 + L + a1n x n = b1
a 21 x1 + a 22 x 2 + L + a 2 n x n = b2
M
a n1 x1 + a n 2 x 2 + L + a nn x n = bn
(A.1.4.3a)
a
j =1
ij
x j = bi i = 1,L n
(A.1.4.3b)
A formal way to obtain a solution using matrix algebra is to multiply each side of the equation by
the inverse of A to yield
A 1 Ax = A 1b
(A.1.4.4)
1
or, since AA = I
x = A 1b
(A.1.4.5)
Thus, we have obtained the solution to the system of equations. Unfortunately, this is not a very
efficient way of solving the system of equations. We will discuss more efficient ways in the
following sections.
Example: Consider the following two equations in two unknowns:
3x 1 + 2x 2 = 18
x 1 + 2x 2 = 2
27
1/10/2003
3
x1 + 9
2
x2 =
which is a straight line with an intercept of 9 and a slope of (-3/2). Now, solve the second
equation for x2
1
x1 + 1
2
x2 =
which is also a straight line, but with an intercept of 1 and a slope of (1/2). These lines are
plotted in the following Figure. The solution is the intersection of the two lines at x1 = 4 and x2 =
3.
x2
3 x1 + 2 x2 = 1 8
6
Solut ion: x 1 = 4 ; x 2 = 3
4
-x 1 + 2 x 2 = 2
2
x1
(A.1.4.6)
represents a hyperplane in an n-dimensional Euclidean space (Rn), and the system of equations
Ax = b represents m hyperplanes. The solution of the system of equations is the intersection of
all of the m hyperplanes, and can be
- the empty set (no solution)
- a point (unique solution)
Math Programming
28
1/10/2003
or
(A.1.5.1)
(A.1.5.2a)
a
j =1
ij
x j bi i = 1,L n
(A.1.5.2b)
(A.1.5.3)
represents a half-space in Rn, and the system of inequalities Ax b represents the intersection of
m half-spaces which is a polyhedral convex set or, if bounded, a polyhedron.
A.2 Calculus
A.2.1 Functions
(A.2.1.1)
x1
x = M R n (column vector );
x n
Math Programming
y R 1 ( scalar )
29
(A.2.1.2)
1/10/2003
y = f ( x ) = c x = ci xi = c1 x1 + c 2 x 2 + L + c n x n
(A.2.1.3)
i =1
(x
i =1
yi ) 2
(A.2.2.1)
A neighborhood around a point x in Rn is defined as the set of all points y less than some
distance from the point x or
N ( x ) = y R n : d ( x, y) <
(A.2.2.2)
A closed set is a set which contains all of the points on its boundary, for example a closed
interval on the real line (R1). In a bounded set, the distance between two points contained in the
set is finite. A compact set is closed and bounded, examples are any finite interval [ a, b] on the
real line or any bounded sphere in R3.
A set S is a convex set if for any two points x and y in the set, the point
z = ax + (1 a ) y
(A.2.2.3)
is also in the set for all a, where 0 a 1. That is, all weighted averages of two points in the set
are also points in the set. For example, all points on a line segment joining two points in a
convex set are also in the set. Straight lines, hyperplanes, closed halfspaces are all convex sets.
Figure 2 below illustrates a convex and a non-convex set. A real valued function f(x) defined on
a convex set S is a convex function if given any two points x and y in S,
f [af ( x ) + (1 a ) f ( y)] af ( x ) + (1 a ) f ( y)
(A.2.2.4)
for all a, where 0 a 1. Figure 3 illustrates the fact that the line segment joining two points in
a convex function does not lie below the function. Figure 4 shows general examples of convex
and non-convex (or concave) functions. An example of a convex function is a parabola which
opens upward. Linear functions (lines, planes, hyperplanes, half-spaces) are both convex and
non-convex functions.
Math Programming
30
1/10/2003
x
y
x
convex
non-convex
f(x )
af(x) + (1-a)f(y)
f(y )
f(ax + (1-a)y )
ax + (1-a)y
Convex
Concave
Math Programming
31
1/10/2003
A.2.3 Derivatives
df
f ( x + x) f ( x)
= lim
dx x0
x
(A.2.3.1)
(A.2.3.2)
That is, to find the partial derivative of a multivariable function with respect to one independent
variable xi, regard all other independent variables as fixed and find the usual derivative with
respect to xi. The partial derivative of a function of several variables f(x) with respect to a
particular component of x, xi, evaluated at a point xo is
f
f ( x )
=
xi
xi
(A.2.3.3)
x
The partial derivative of f(x) with respect to the vector x is a row vector of partial derivatives of
the function or the gradient vector
f ( x ) =
f f
=
x x1
x n
(A.2.3.4)
(A.3.1.1)
where (i, j, k) and (r, , k) are unit vectors in the (x, y, z) or (r, , z) directions, respectively.
Math Programming
32
1/10/2003
The gradient operator, del (from the Greek nabla) or , is defined in rectangular coordinates as
the vector
()
x
() ()
()
()
=
i+
j+
k
() =
z
y
y x
()
(A.3.2.1)
The major operators: Gradient, del or , () ; Divergence, div or () ; and Laplacian, del
dot del or () = 2 () can be defined in the rectangular and cylindrical coordinate systems
as:
Gradient
Divergence
()
()
()
i+
j+
k
x
y
z
()
1 ()
()
r+
+
k
(Cylindrical) () =
r
r
z
A y Az
A
+
(Rectangular) A = x +
y
x
z
(Rectangular) () =
(A.3.2.2)
(A.3.2.3)
(A.3.2.4)
Proof:
()
()
()
A =
i+
j+
k Ax i + Ay j + Az k
y
z
x
Ay
A
A
= x i i +
i j + z i k
x
x
x
A
A
A
y
j j + z jk
+ x ji +
y
y
y
Ay
A
A
k j + z k k
+ x k i +
z
z
z
Ay
A
A
( 0) + z ( 0)
= x (1) +
x
x
x
Ay
A
A
(1) + z (0)
+ x ( 0) +
y
y
y
Ay
A
A
(0) + z (1)
+ x ( 0) +
z
z
z
A
A
A
y
= x +
+ z
x
y
z
Math Programming
33
1/10/2003
1 (rAr ) 1 A Az
+
+
r r
r
z
2
2
() () 2 ()
2
(Rectangular) () = () =
+
+
x 2
y 2
z 2
(Cylindrical) A =
Laplacian
A = 2 A =
e.g.,
(A.3.2.6)
2
2 Ax Ay 2 Az
+
+
x 2
y 2
z 2
1 ()
1 2 () 2 ()
(r
)+ 2
+
r r r
r 2
z 2
1 Ar
1 2 A 2 Az
+
A = 2 A =
(r
)+ 2
r r
r
r 2
z 2
(Cylindrical) () = 2 () =
e.g.,
(A.3.2.5)
(A.3.2.7)
(A.3.3.1)
0
Ky
0
0
0
K z
(A.3.3.2)
The term K h is the product of the matrix K with the vector h or (using row by column
multiplication)
K x
K h = 0
0
0
Ky
0
Math Programming
h
K x
0 x
h
0 = K y
y
K z h
K z
z
h
x
h
h
h
h
i + Ky
j + Kz
k
= Kx
x
y
z
y
h
34
(A.3.3.3)
1/10/2003
K z
z
h
x
h
y
h
()
()
() h
h
h
=
i+
j+
k K x
i + Ky
j + Kz
k
x
y
z
x
y
z
Math Programming
(A.3.3.4)
h
h
h
+ K z
Kx
+ K y
x
x y
y z
z
35
1/10/2003