0% found this document useful (0 votes)
13 views148 pages

Ad 0680613

Uploaded by

gkutube184
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views148 pages

Ad 0680613

Uploaded by

gkutube184
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 148

MEMORANDUM

RM-5766-SA
DECEMDER IPOR

SOME CURVE-FITTING FUNDAMENTALS


R.. L. Petrusohell

This research is supported by the Department of Defenie, under


Contract DAHC15 67 C 0150, monitored by the Assistant Secretary
of Defense (Systems Analysis). Views and conclusions expressed
herein should not be interpreted as representing the official
opinion or policy of SA.
DISTRIBUTION STATEMENT
Thi document has lwen approved for public red'aw,, and -le: its distribtition is unlitnitrd.

Ife *- I. 4 £ kt. *.f (-pop- 9 0*4


T'his study is presentled w', it copet.'il treatme nt of the su jert, worthyv of pub.
licteaIion. The Ran( ot p0ration %ouchesfor thec qualIity of the resea rih wit hout
neccssa ri I endorsing ihe opin ions and (cnci si ons of the auathors.

Publihed by The RAND) Corporation


; - iii-

PREFACE

This Memorandum was written for the Office of the Assistant Sec-

retary of Defense (OASD), Systems Analysis. It supplements material

presented in the proposed OASD handbook, An Introduction to Military

Har.,'are C ot Analysis.

The cost analysc draws heavily on formal statistics in general and

regression analysis in particular when developing estimating relation-

ships. There are numerous times when, because of data limitations, he

must instead rely on mechanical curve fitting and the development of

empirical equations.

The material presented here is intended to provide the practicing

cost analyst with a hasic knowledge of the mechanics of curve fitting

and of the properties of the equations he uses. For the most part, the

choice of material reflects an attempt to answer those questions which

years of experience have shown to be most common and troublesome.

Similar information can be found in other sources, but to the boest

of the author's knowledge does not exist in any single source. The

integration of analytic geometry with curve-fitting methods and the

selection of the material itself are the unique features of this pre-

sentation. The mathematical discussions are purposely intuitive. They

are intended to be understandable by persons having, at best, iimited

mathematical training.

Special-purpose functional forms and curve-fitting methods have

not been included. Only those forms and methods that have already

proven to be of general use to the cost analyst are described here.


-iv-

Although the computational schemes presented in the Memorandum

are for the most part suitable for the desk calculator, high-speed

digital computers are widely available today, and should be used when

possible.
_V--

SUMMARY

Much of the difficulty cost analysts have with curve fitting e-

suits from an inadequate grounding in the analytic geometry uf the em-

pirical equdtions with which they work. This Memorandum attempts to

provide a concise but relatively thorough discussion of this subject

while at the same time demonstrating selected methods for Iv;!i mechan-

ical curve fitting.

The material is presented in three parts. Section I discusses

the properties of the straight line, the exponential, the power func-

tion, and the parabola. Included in the discussion of the exponential

are the laws of exponents and hence logarithms. Emphasis is on pro-

viding insights into the impact of the parameter values on the form of

the resultant curves. Graphical illustrations are used extensively.

Section II presents cferent methods of using these curves to de-

scribe the relationship between two variables. It discusses the method

of selected points, the method of averages, and the method of least

squares, making considerable use of scatter diagrams. It describes a

number of measures of goodness of fit including the standard deviation,

the coefficient of variation, and an average percent deviation. Through-

out this section, computational procedures are carried out in complete

detail.

The discussion of curve fitting is continued in Section 111, where

cases with more than two variables are considered. By using the method

of successive approximations, the initial discussion attempts LO con-

vey the idea of a nec relationship between two variables, eliminating

influence of any others, and thus to clarify the meaning of the coeffi-

cients in the multivariate linear equation. The method of least squares


-vi-

is shown to produce the same results as did the method of successive

approximations and with significantly less computatioral effort. The

discussion turns next to the nonlinear case. Each of the functional

forms described earlier--the exponential, the power function, and the

parabola--is used to describe a nonlinear relationship between three

variables. Although the method of successive approximations may be

used for fitting ',:-:' o nonlinear relationships, only the method

of least squares is described.

The decision to discuss the analytics and the curve-fitting methods

in 3eparate sections of the Memorandum was purely arbitrary. For many

purposes, the user of this material will want to combine his readings

in the first section with his readings in subsequent sections. The

parallel nature of the presentations in each section was designed to

facilitate Lhis.
-vii-

CONTENTS

PREFACE . . . . . . . . . . . .

SUMMIARY .* v

Sect ion
I. CURVE FITTINC AND EMPIRICAL EQUATIONS......... .. .. .. ... 1
Introduction . . . . . . . . . . . . . . . . . . . . . .
Some Basic Analytic Ceomet.................2
Straight Line .. ...................... 3
Parabola. ......................... 9
ExpUlnfntliai................................22
Power Function ....................... 38

IH. FITTING CURVES TO TWO-VARIABLE RELATIONSHiIPS ............. 48


The Straight Line. ...................... 48
The Method of Selected Pc-nts. .............. 48
The Method of Averages ................... 51
The Method of Least Squares. ............... 55
Summary. .......................... 59
The Parabola ......................... 59
Parabola Form 1. ..................... 62
Parabola Form 2 .. ..................... 7-
The Exponential. ....................... 74
Thle Power Function ...................... 83

Ill. THREE-VARIABLE CURVE FITTING ........................


The Linear Case...............................91
The Nonlinear Case ..................... lot,

APPENDIX
A. Derivation of 'he Normal Equations for a Least-squares
Fit of a Straight Line, A Parabola, and A Three-
variable Linear Equation. ................. 1-15

B. Derivation of the Formula for Calculating the


Constan~t a. ........................ 130

BIl3LItX;RAPHY.............................H4
-lx--

TABLES

1. Using the Method of Selected Points to Fit a Straight Line:


Uata and Results. ...................... 50

2. Using the Method of Averages to Fit a Straight Line:


Data and Results. ....................... 53

3. Us ing the Method ot Least Squares to Fit a Straight Line:


e. t . . .. . . . . . . . . . . . . 56

4. Using tiit ±eLhuLd Of 'Selected Points to Fit the Parabola


Form 1: Data and Results .................. 63

5. Using the Method of Averages to Fit the Parabola Form 1:


Worksheet .......... ................. 66

6. Using the Method of Least Squares to Fit the Parabola


Form 1: Worksheet. ..................... 69

7. Using the Method of Least Squares to Fit the Parabola


Form 2: Worksheet. ..................... 73

8. Using the Method of Least Squares to Fit the Exponential:


Worksheet ........................... 77

9. Using the Method of Least Squares to Fit the Exponential


with the Constant a: Worksheet . .............. 8

10. Using the Method of Least Squares to Fit thle Power Function:
Worksheet ........................... 84

11. Using the Method of Least Squares to Fit the Power Function
with thle ('ons talr L Work)iet t...............89

12. Data on Selected Report ..................... 94

13. Using Successive Approxiniat ions and the Method of Averagts


to Fit a Three-variable Linear Equation: Worksheet . . . 98

14. Using the Method of Least Squeres to Fit a Three-variable


Linear Equation: Worksheet. ............... 101

15. Reqults of Fitting a Three-variable Linear Relationship .. 108

16. Results of Fitting a Three-variable Power FUnction


Relationship .. ....................... 115

17. Results of Fitting a Three-variable Fxponential Relationship 119

18. Results of Fitting a Three-variable Parabolic Relationship. 122


- > i1- :

F IGURES

I. Straight line with known slope drawn through point .. 4

2. Straight line drawn through two p,,ints, .r". and x,,,,) ....

3. Straight line with intercepting at and passing through


point '2 6

4. Straight line showing .r and intercept.. 7..........7

. Straight line parallel to -"or axis ...... ............ 8

b. A parabola . . . . . . . . . . . . . . . . . . . . . . . . .

7. Parabola with vertex at the origin, ,pening to the right 10

8. Parabola with vertex at the origin, symmetrical to the


axis and opening upward ............. ............ 13

9. Relationship between parabola : two sets of coordinate


aiXes . . . . . . . . . . . . . . . . . . . . . ..

10. Points used to ill us trate fit t ing the parabola ....... .

11. Parabola passing through three points ... ............ . 1

12. Implications of the ust of parabolas for extrapolation . . 21

1 3. Negative anid positive exponential cu . .. .......... 21

114. I1he effect of the valu'les assigned to on the level of


te expPon ent i.1 .

I &urves
C illustrating the' relat 1onship bet'n . e ,
I, =", and .r - iE' and their 1ovarithit t r,inst or!.at ions .)s

. sti mat ing tCe slI.pe o I the expression, 2(2) at points


i' and I' .

1 7. Fxponen,, ti fit ted throuih two points ... ............ ...

18. Power fun-t ion with vos it i ve e ., - .:r a!d X "40....40

1J. Pcwer function ,,' , with positlve exponent and 0 - 1 42

20. Fower funoction , - .r' , with n-.izative expontnt and r. 4

21. PcNwer function ax' with negative exponent showing


effect of . . . . . . . . . . . . . . . . . . . . . . .. .
Sxii-

22. Power function y = b on logarithmic coordinates ....... . 46

23. Straight line fitted using the method of selected points 49

24. Straight line fitted using the method of averages ....... . 52

25. Straight line fitted using the method of least squares . . 58

26. Results of fitting a straight line to the same data using


three alcernate methods ....... ..................... 60

27. Parabola form I........ ....................... . 61

28. Parabola form 2 ......... ....................... ... 62

29. Parabola form I fitted using the method of selected points 64

30. Parabola form I fitted using the method of averages . . 67

31. Parabola form I fitted using the method of least squares 70

32. Fitting a parabola form I usi. thr-e alternate methods . 71

33. Parabola form 2 fitted using the method of least squares 75

34. The exponential fitted using the method of least squares,


and the equivalent semi-log form .... ............. ... 79

35. D!termining the constant a using the method of least squares 80

36. Power function fitted using the method of least squares . 85

37. Power function with constant a, fitted using the method


of least squares ........ ..................... ... 90

38. Using succcssive approximatiors to achieve continuously


improved relationships .... .. .................. ... 97

39. LeasL.,quares solution to the three-variable problem


showing one way to graph a three-variable equation . . .. 104
40. Scatter diagram: XI ve X 2 '09

41. Scatter diagram: A'I vs X2 wi~h contours showing equal


values of 3 1 . . .
2..
. . . . . . . . . . . . . . . . . 110

42. Scatter diagrar; XI vs X3 with contours showing equal


values of X........ ...................... . I

43. Results of fi~ting a three-variable linear relationship . . 113


I
-xiii-

44. Results of fitting a three-variable power function


relationship ........ ....................... .. 116
45. Results of fitting a three-variable exponential relationship
120
46. Results of fitting a three-variable parabolic relationship
123
B-I. Determining the value of the constant a ........... 131
I. CURVE FITTING AND EMPIRICAL EUATIONS

INTRODUCTION

An effeccive cost analysis capability cannot exist without the

systematic collection and analysis of data on past, present, and pro-

jected programs. The analysis must result in the development of esti-

mating relationships which can be used as a basis for estimating the

resource impact of future proposals. These relationships typically

relate resource requirements to the physical, performance, or opera-

tional characteristics of the system, and are, in essence, formal

statements of the way one or more variables relate to each other. At

times a simple factor such as cost . r mile is all that is needed.

Frequently, .owever, a more complex relationship is required such as

the one between the weight and cost of an aircraft and the manhours

required to maintain it. The necessary relationships are usually ex-

pressed as mathematical equations, as curves drawn on coordinate paper,

or both. In either case, the methods of curve fitting are essential

to the development process.

Just what do we mean by curve fitting? Suppose we plot a set of

corresponding values of two variables on coordinate paper. The prob-

lem of curve fitting is that of finding the equation of a curve that

passes through (or near) these points in the graph so as to indicate

their general trend. An equation determined in this way is called an

empirical equation between two variables, and the process of finding

it is called curve fitting. While curve fitting as such.deals

primarily with relationships between two variables, certain of the


-2-

basic methods can be used to establish empirical equations among

three or more variables. In cost analysis it is often useful to show

the relationship between two variables in the form of a curve even

when there is no mathematical expression possible; the methods of

curve fitting can be used to establish such curves.

One of the interesting things about curve fitting is that there

are so many different ways to do it, and a review of the literature

on the subject would lead one to believe that there are as many

methods as there are authors and problems. In fact, we may reason-

ably conclude that curve fitting is more of an art than a science.

Fortunately, however, many of the methods are useful in solving

only a limited number of unique problems, and for that reason are not

of interest to us here. It is th intent of this Memorandum to ex-

plain curve fitting it.a general sense and to present only those

methods that experience has shown to be of general use to the cost

analyst.

SOME BASIC ANALYTIC GEOMETRY

This section disc-sse the mathematical properties of some func-

tional forms, the general shape of the curves portrayed by each, and

the relationship between the shape of the curve, its location, and

the values of the equation constants. Since our greatest concern at

this point is to develop the equation describing a particular rela-

tionship, we will present the techniques for calculating the equation

parameters, both constants and coefficients.


I
-3- I
The different fu..tiona]. forms that will be treated are summarized

below:

y = a + bx, straight line,


2
y = a + bx + cx, parabola,

& = abX, exponential,


h
y = ax , power function.

These suffice for most cost analysis problems with two variables. Although

there are other forms that are frequently used, they generally can be re-

lated to those given above through appropriate scale transformations.

Straight Line

The straight line is certainly the simplest functional form to

deal with. It is completely defined by knowing any two points on

the line. The main feature of any straight line is the slope or

tilt of the line. If the line rises reading from left to right as

in Fig. I, from points xly to P(x,y), the slope is said to be positive;

if the line falls reading from left to right, the slope is said to be

negative. The actual value of the slope (b) is the ratio of a change

in y to a related change in x, that is, the ratio of the length of

the vertical dashed line to the length of the horizontal dashed line

in Fig. 1.

If we are given any point on a line, say P(x,y) as in Fig. 1, and

the slope (b), we would be able to deduce the equation of the line.

This may be expressed symbolically as

(y -Y)
b - T -
(x I)
-4-

P(,Y

xly1

Fig. 1--Straight line with known slope


drawn through point xl 1 .

or

(y - yl) = b(x - xl) ,

which is known as the Point-Slope form of the equation of a straight

line. If we are given one point and the slope, we would immediately

be able to substitute in the above and have the equation of the line.

A slight modification of this results when, as in Fig. 2, the

slope is not known directly, but two points on the liti are given.

Then the slope can be calculated as follows:

)
(Y 2 -
(x -x)'
2 1

and this value may in turn be substituted into the Point-Slope formula.
-5-1

2p

I x

Fig. 2--Straight line drawn through


two ,oints, "lA and x2 4 ,

The modified form of the Point-s lope formula for the equation of a

straight line is

( ,. - ,t

t'his particular form is probably used more than any other in titt ng

straight lines.

'here are other instances when, as in Fig. , tCi' [I,pe and the

intercept are known. ie intercept, morre proper lv called the inter-

cept, is the point where the lint crosses the 1. axis. I11is point is

i'entiitid in Fig,. .3is P(0, ). Beca l-,ethe coordinates of the inter-

cept art a-c


a t, I as t'he 'coordinatsof iniv other point. wt m.ax rst,

them to writt the equation ot the line. ihe Poiiitl opc tormtil.l
1 r
-6-

a straight line is used and the resuit is

(y - a) = b(x - 0)

which simplifies to y = a + bx where both the intercept (a) and the

slope (b) are immediately recognizable. This is known as the Slope-

Intercept formula for the equation of a straight line.

(x 2 , y€2 )

slope = b

P(a a)
~X

Fig. 3--Strai.gnt line %.ith y intercepting at a


and passing through pcint x2 ,

The remaining case is where both the .r and the : intercpts are

known. As in Fig. 4, a is the value of : when x is equal to 0, and c

is the value of x when is equal to 0. Notice that, in this case,

the li:e slopes downward from left to right so that we would expect

the slope to be negative. Writing the equatioa for calculating the

slope using the modified Point-Slope formula, ;.ehave


(s-a) (a -0) (.r 0),

y a - X.

(0,a)

Fig. 4--Straight line showing


.and .., intercepts

Notice that the y inteicept is equal to : ind the slope' is eniual

to ;zdivided by This form of the formula for a straight Line is

called the Intercept form. Another form of this equation which re-

sults from a slight rearrangement is

In this fo -m the coordinates of the tw( intercepts are i-=edlitely

recognizable.

The next figure, Fig. 5, shows two special cases of the straight

line, the line parallel to the -- axis and the line parallel to the
I

ax.s. In the first case the slope, ;, is 0 and in Lhe second it is

infinite. The equations for these two lines are quitt simple. I;"

the first case where the line is parallel to the .- a':is, the equal ion
is

where , is the constant distance from the x axis. In the second case

where the l'ne is parallel to the axis, the equation is

whert, c is the coustant distance of the line from the axis.

Fig . -- St rah gt line paral Ile t, or axis

Any of the formula- pi tvented above may be used to vr'rite the

equation of a .traight line. The choi e will dcpend on the particu-

lar kind ,I i :tormation availa!,,> -t tie t i.-e, for t!'ere are ti.nes
-9-

when each is useful, The most generally useful, however, is the modi--

fied Point-Slope formula.

Parabola

The parabola i not as cmmonlv used as the straight line, but

:las sufficient appli.'ation to make it worthy of treatment here. The

oarabola is defined as the curve described by points equidistant from

a fixed point and a fixed line. Figure 6 shows such a curve. The fixed

point F is .nown a- the focuS of the oa- hola and the fixed line

as the directrix. Of course the eq a tion of sUCh a curve depends on

its location with respect to the coordinate F. For th. mo ment,

however, we will posit ion the 'urve as sho ,r- in F g. 7. Ihe vertex

i,---- trx

//l is I-
-10-

is at the origin of the coordinate axes and the line of symmetry of

the curve is the x axis. Referring to the definition, we find that

when the value of y is 0 (which is the case at the origin), the di-

rectrix is the same distance to the left of the origin as the focus

is to the right. Further, assuming the distance from the directrix

to the focus on the line of symmetry is p, the coordinates of the focus

are by definition (p/2,0), and similarly the equation of the directrix is

x = -p/2.

Letting P be any point on the parabola and setting the distances

FP and PL equal to each other, according to the definition, we c.n

derive the equation for parabolas symmetrical to the x axis, the

LI
f/

/
/,
= 2
/
/
/
0O)

Fig. 7--Parabola with vertex at the origin,


opening to the right
vertex at the crigin, and opening outward to the right. Using the

standard distance formula for determining the length of the two


,
lines FP and PL, and setting one equal to the other, we obtain

2
FP = (7 0 (x - /2)

PL = N + p/2)
j - 2

2 2
N(Y - 0) + (x - p/2) 2 )2

When both sides of this expression are squared and the result expanded,

we have
22
y + X px + - = X + X +2
4 4'

The distance between any two points on rectangular coordinates


may be calculated by using the following formula:
y

i' y22 2,
d = 1(YI
- Y2 ) + (Xl - x2)
(x 2Y 2 )

where d = the required distance, (Xl )


x
(x1 ,yI) = the coordinates of the
first point,

(X2 ,Y 2) - tkie coordinates of the


second point.
=! I

-12-

which simplifies to

2
y2 2px,

and is the equation of the parabola shown in Fig. 7. If the parabola

were as pictured in Fig. 8, the equation could be obtained by using

the same method:

2 2
p/2) + (x)
FP -

2 ,
PL = y + p/2)

p/ 2 + ()=V( + r2
2. 2
4 x2
+ 21+
2 y-py + = y
2 + Py + P42__

2
x = 20y'

which is the equation for a parabola with its vertex at the origin,

symmetrical to the 4 axis, and opening upward. Notice that the ninety-

degree rotation, as was made between Fig. 7 and Fig. 8, caused the x

and the y terms to be interchanged. Otherwise the two equations are

identical.

When the vertex of the parabola is shifted away from the origin,

as in Fig. 9, the equation will again be altered. To show how, we

regard the problem as one of shifting the intersection of the axis of


To
the coordinate system from the point (h,k) to the point (0,0).

make the translation we set

x x' +h or x' =x - h,

y -y' + k or y' - -k,


F(Oq-) -I

Fig. 8--Parabola with vertex at the origin, symmetrical


to the y axis and opening upward

Fig. 9--Relationship between parabola anid two


sets cf coordinate axes
-14-

where x and y refer to the original axes; x' and y' refer to the axes

whose center coincides with the vertex of the parabola; and h and k

are the coordinates of the origin of the x', y' axes measured from

the T,y axes.

When we substitute (x - h) for x' and (y- k) for y' in the

equation y,2 = 2px', we have

2
(y - k) = 2p(x -h),

which when expanded yields

- 2k 9 + k = 2px- 2ph,
2 2=0.A+
Y 2px - 2ky + 2ph + k 2 0.

Because in all cases h, k, and p will be constants, the equation may

be written as follows:

2
y + Dx + Ey + F = 0,

where D = -2p,

E = -2k,

F = 2plz + k

This is the standard form of the equation for all parabolas symmetrical

to a line parallel to the x axis.

If instead of y' 2px' we had started with x ' 2 'we

would arrive at the standard form of the equation as follows: Sub-

stituting (y - k) and (x - h) for y' and x' respectively gives us

(x - h)2 = 4'Qi -k)


-15-

which when expanded yields

2 h2
x 2xh + - 2py + 2pk O.

After substituting as above, we have

2 ' F
x + D + Ex + F'= 0,

where D = -2p,

E' = -2h,

F' = 2pk + h2

This is the standard form of the equation for all parabolas symmetrical

to a line parallel to the y axis.

If we take each of the two standard forms in turn, shift the

terms and divide through appropriately, we arrive at the following

equations:

1 2 E F

1 2 E' F'
DX D Y

As each of the coefficients in the above expressions is a constant,

we can make further substitutions and obtain either

Ay 2 + By + C =x,

or

Ax2 + 8X + C =y
-16-

wl ivrc v /),

B F/D or E' /D,

C = F/D or F'/D.

'hese are the forms of the parabola that are most commonly used

in curve fitting. Since there are three coefficients, or unknowns,

at least three points must be known to define the curve. Given three

points on a parabola, the equation may be obtained by using the coordi-

nates of each of te points to obtain an equation of the above form

and then solving the three equations simultaneously for A, P, and

To illustrate, we are given the three points (0,2), (3,4), and

(4,12). Plotting these points as in Fig. 10 leads us to believe a

parabola opening upward and symmetrical to a line parallel to the

axis would be the correct form to fit. The standard form of the

equation for this type of parabola is

= 2.,'+ 5,"+

12 * (4,12)

I0

O4

!ic'. t S--l'used
juts t i lust rate I itt in g the par1LboIa

5
-17-

Substituting each of the three points in this expression allows us to

write the following three equations. Notice that the x coordinate

must be squared to make certain of the substitutions:

2 = OA + OB + C,

4 = 9A + 3B4 + C,

12 = 16A + 4B + C.

It is obvious from the first equation that i,is equal to 2. Using

this knowledge to adjust the two remaining equations will reduce the

problem significantly. In this case, the two remaining equations

are

2 = 9A + 3B,

10 = 16A + 4F.

There are a number of ways to so.ve simultaneous equations.

Probably the simplest for only two equations is the determinant meth-

od. As the number of variables and thc number of equations get larger,

however, other methods are preferred. In fact, when four or more equa-

tions are invived, it is probably best to look for computer programs

to do the job. The determinant method is particularly well adapted

to the desk calculator, but not particularly well suited for illus-

trative purposes. !-ere we will divide by the leading coefficients

and eliminate variables by subtraction.

Dividing the first cquation by 9, the second by 16, and subtract-

ing tile first equation from The second, A is eliminated. These steps

follow:
-18-

and

A + B -L

Subtracting the first from the second, w

The ncessary simplifications and other arithmetic having been performed,

29
6

We next substitute the value of B into the first of the two variable

equations and calculate A as follows:

+ 87
- A94+54'+ -4
108 + 783
486

Ii

The required coefficienLs are now seen to be

A 11
6'

29
6'

S2.
I

-19-

It is usuall,, goo.( practice to substitute all of these coefficients

into one of the original equations to test the correctness of the

arithmetic. Substltuting in the second equation we have

+, ( 2() + 2.

Fihe requ ired ar i thmet ic show'S Us that the va lues of the coefficients

calculated are in fact correct. Fihe equation that we have been looking

for is therefore

.= .++ 2.

Solving this equation for ., given a range of values of and plotting

them, allows us to draw the curve sho,n in Fig. 11. Contrary to our

expectation, this form of parabola is not ai good represent.ation o! Lie

rlationihip implied hv the three points. This example iijListrtt-s

\j ) 4,,
1

A. '--P'arabla passing through, threv , t


£0

an inherent difficulty associated with using thle parabola, I f we had

not examined the charac teristics of this curve over thle rele'vant v:Iiules,

of x, the fact that 1,is negat ive between 1'K and r 2 would not

have been noticed and coul 2! have led to absurd cost tstimates.

In curve fitting we are concerned primarily with the best repro-

sentdtlon of thle data at hand. In cost estimating we are typically

concerned with extrapolation beyond the range of thle exist ing data.

When we choose a parabola to represent a relationship between two sets

of data, wcmgeneral ly use o)nly, a limited segment of thle ent ire curve.

Figure 12 illustrates how this fact can lead to !rouble Thie boxed-in

segments of thle curve show thle part of thle CuICvC used to describe thle

data. Examination of the curves outside the limits of thle various

boxes shown in FigE. 12a, 12b, and 12c indicates thle kind of trouble

one can geT. into by using this type of curve for making excrapolat ions

There are times when, thle best parabolic function to represent a

SOL of d .a IS of the form

+ +,ii-

Since it is. conventional :-or i. to be the dependent vri~l

equat ion caus es somne d i t t i c it t V . On'e wa% thi SJi ff lu i tv , an be

ovrcme I?ittng I L, t. cur've *i t o --oIVe tile result tl et' !an

for 'using thte r.U.drat ic fornu li. Frirst the equit ion mutlb re-

wr itt en as! a I 1ow ..

-+ +
x

02b
I PI'r ibo'i 'pv'ilinr to~ th right

Fi. 12-- .nplI i ca t ions oic he,- &Ifptriols or ex z rapol a on


-22-

(12c) Anotrier parabola opening downward

Th2n, using the quadratic formula,

--B 4 B A (A)(C
x)

Use of this formula will probably result in two solutions because the

squarc -oot of a numoer can be eicher positive or negat ye. Each

equation must be evaluated to deter J.s which is appropriate.


z w.

E',po~jnnia I

The general form of the exponential equation is Lw abJ Graph-s

of two exp~nential equations that differ from each ot~ier onlyv with

respect to the value of 1,ar, 7hown in Fig. 13. In each case a has

been set equal to 1. As will be shown, only the levt-l of the exp'o-

nential is affected by the value of a.

A graph similar to that shown Li Fig. 13a results wherever 1,i..

greater than 1, and a graph similar to that in Fig. 13b if 1, is between

In this text, the function with the independent variable x as the


exponent is called the exponential, while thle function y oxb where
the exponent is a constant, is called the power function.
-23-

I and U. If is -qual to I the exponential equation becomes

fr I raised to anv power is equal to 1. If :' is 0 there is no equa-

tion. for 0 raised to any power is O, and conseauently

Whlen ' is negative (less than 0), tl! exponential is discontinuous

and, for that reason, of no value to us for curve fitting.

? y
8

6 - 6

4-

2 2

3 2 if 1 2 3 3 2 1 13

0 0

(O a) Exponential ::ith b 1 (13b) ExDnential with 0 .

Fig. 13--Negative and positive exponential curves

For our purposes, the fact that the exponential curve rises from

left to right When K is greater than i and from right t. left when b
-24-

is between I and 0 is the relevant characteristic. Also notice that

=
both curves pass through the point 1, x 0.

The influence of a is illustrated in Fig. 14 Larger values tend

to raise the curve while lower values cause a downward shift. When x
.0
is equal to 0, b0 1, and the exponential becnmes

! =a.

Consequentiv, a may be thought of as the ? intercept.

8 2 8

2(2x) Y 1

6 6

(2

-4 4

2 2

3 2 1 1 2 3 2 1 1 2 3

0 0

(14a) Exponential with b > I (14b) Exponential with 0 < b < 1

Fig. 14--The effect of the values assigned to a on


the level of the exp',nential

Since facility with the exponential requires an understanding of

exponents and logarithms, we will digress temporarily to review these

L
-25- 1
topics. The system of exponents is based entirely on five basic laws

and four definitions. The first definition states that the expression

, where K is an exponent and a is greater than 0, is the product of

a multiplied 'y itself r times:

3
Sa x 7 x,

etc.

The first law of exponents states that the product of a and a is a

which incidentally follows directly from the initial definition. To

illustrate:

2 3 2+3 5

2 3
a7 = aa, 1 = 2A JXa (I

5
(a x a)(a x a x a) = (a x a x a x a x a) = a

Each of the other laws can be similarly derived and it would be a

worthwhile exercise for the reader to do so. All five laws are sum-

-,rized below:

mX
maa+

IL aa = a

i /
() =a /1

v. (a/b) n = a Zlb .
-26-

Three additional definitions complete the system; a is defined as I,


-n Y * a/n
a is defined as I/a , and a is defined as the nth root of a. The

root is positive if a is positive, and negative if a is negative and

n is odd. This system not only gives meaning to the expression a

when a is greater than 0 and x is any rational number, but also provides

the inputs essential to a discussion of logarithms.

The logarithm of a number is the power to which a base number must

be raised to equal the original number; it can be more conveniently

expressed as

X
yZ =a,

where x is the logarithm of y to the base a. In the language of loga-

rithms we would write

=
x logay

The logarithm x is also an exponent. From this and our earlier

discussion of exponents, we conclude, and rightly so, that any rational

number greater than 0 can be the base of a system of logarithms. In

actual practice, however, 10 and the constant e are most commonly

used. When the base is 10, the logarithms are called common (logs)

It should also be pointed out that such definitions seem logical


from the law of division. That is,
n 0
a i -n 0 1 a 0-n
= a -n .
1 = a = a =a and -- = a
n n n
a a a

The constant e is the limit of the expression (I + V) as v


approaches 0; the limit is equal to 2.7183 to five significant figures.
It is one of the most important limits in calculus.
-27-
4
logarithms, and when the base is e they are called natural (1-), or

Napierian logarithms. We shall follow the general practice of using

the abbreviation log where 10 is the base and in where e is the base.

Tables of each are readily available.

Any rational number greater than 0 can be expressed in terms of

its logarithm and consequently in terms of 10 or e. Expressing a

relationship in terms of e leads to simplification both of form and

of required computations. Suppose, for example, we have a number,

y, which we wish to express in terms of e. We would only have to find

in :,in a table of natural logarithms to write

in y = x,

or in exponential form,

x
=e

Figure 15(a) shows us that these two equations have exactly the same

graph as do the equations

In x=,

and

X =

Fig. 15(b) provides similar information for reciprocal relationships.

Interchanging the x and ,,terms does, however, cause an exchange

of coordinate axes.

To express the exponential Y = 16.5 X


in terms of e we treat the

number 16.5 as e and write


-28~-

y
8

y e or x Iny

tx

(15a) y = e xor x = 1n y and


X = eyor y =In x
4

y
8

or y e

-- L I -- I I
4 2 2_ 4 6 8

2 !

(15b) y = - I or x = In and

y Ii

Fig. 15-17rurves illustrating the relationship between y - e , x = y


Ieand x - Ile. and their logarithmic transformations
-29-

16.5 =e

and

In 16.5

From a table of natural logarithms we find that In 16.5 is approximatelv

2.83 and we write eithc-r

In 16.5 = 2.83,

or

16.5 = e2.83

Substituting in the original exponential equation and applying the

third law of exponents we obtain

2.83
=(e ),

and

S=e 2.83x

Wen the exponential is expressed in terms of e, the slope of the curve

at any point is equal to the value of the expression at that point.

When the exponential is not expressed as a function of e, the slope

is proportional to, but not equal to, the value of the expression at

the point; i.e., slope =

For ,xample, Fig. 16 shows the graph of the expression 2(2*=

which is an exponential of the form -, = z". Since this is not writ-

ten in terms of e, we would expect the slope at any point to be pro-

portional to the value of the expression at that point. We can chck

this--at le',qt approximately--by estimating the slope of the curve


at two points and comparing trip results with the value of the function

at those points. To do this we establish the eqiation

S - k/<'X

where Sx the estimated slope at point r,

i = tt'e value of the function =2 (2 at point x,

= the constant of proportionality.

11
8 ,Y

x 1y

2 1 1

Fig. I b--E!;Cimat Ing the slope of the express ion


- (2)"' at points P, and P.,

Two Points, P' aind P~ ,,haIZiVk beVen1 seect ed, one a t e it Ivr vend of

the curve. It is obV iou, that the slopes at these two points are k!it-

e rent . Let us ass ume that the curve extended an -quaI d i,t anci fror

each P in either direct ion (shown on the gray h as the h \potenUSe of thc

indicated traingles) is a itraight line. Remembering the discussion of


-31-

the straight line, we see that, having made this assumption, the coor-

dinates of the vertexes of the triangles provide sufficle:A information

to estimate the slope of the curve. The coordinates of thL vertexes

of the upper triangle are

. - - 9 )()= 8.19{
2 .00,

= . I ,( ,l = 6.063.
-

The formula for calculating the slope of a line given two points is

2
2 "' 1

and on substitution

8.000 -6.063
2.00 - 1 .6)0

The x coordinate of point P 1 is the average ot " and X, (1.80).

Where this point, , is substituted in the expression 1, = 2(2 ", the

resulting value of is bb.')6. Returning to our proportionalitv state-

me nt

we <ubst tute ippropr iatelv and -get

The ? coordi nates were obtained by subs t itut in:g the .7'c'ord inat VS
in the expression ', = 2(2Y); values to three decimal1 pla,es were ob-
tained by solving, a procedure that improves the agreement between the
estimated slopes.
-32-

indicating that tho slope of the curve can be evaluated at any point

by multiplying the valu- of the function at that point by 0.695. To

check this we use point P 2 in exactly the way we did P1 and derive

another estimate of the slope and the value of k.

For the lower triangle

M
X 2 . -0.4, Y2 - 1.516,

M
,rI = -0.8, Y, = 1.149;

therefore the slope

1.516 - 1.149
-0.4 + 0.8

S 0.918.

"he value of the function at X (-0.6) is 1.320; therefore

').918 = 1.320k,

4- = 0.695;

and we are satisfied that the required constant of proportionality does

eyist. The value of the expression at each of the two points was calcu-

lated to four significant figures using the expression itself. Values

were not read from the curve. It is also interesting to note that for

the illustrative expression of the form - atab, the value of b was 2.

Further, the natural logarithm of 2 is equal to 0.6931 which is quit

close to the value of Z" estimated above. The fact that our results

were no closer to the theoretical value of ' is due largely to the

assumption that the curve was linear over the relevant range.
-33-

The constant of proportionality can be proved analytically to be

e,.ictly equal to the ratural logarithm of b. Further, in an exponential

of the form y
X , b is equal to
e
- e and the natural logarithm of e is

equal to 1. Thus for these kinds of exponential expressions it is ob-

vious that k must also be equal to 1.

Le. us turn now from the digression to our main discussion. It

has probably already been noticed that the exponential expressed in

logarithmic form is a straight line function: i.e., the expression


X =
-•zi is equivalent to the expression In Iin a + in bx; notice

In a and In i,are constants. This fact greatly facilitates fitting

with exponential expressions.


a+bx
If we are given the exponential y - e and we want to put it

into logarithmic form, we take the logs of each side. Although the
a+bx
logarithm of y presents no problem, the logarithmic form of e may

appear to. When we remember the Laws of exponents and the fact that

logarithms are in fact exponents, we find that a + bx is the logarithm

of y to the base e, and as such becomes the natural form of the right-

hand side of the equation

In . • a + bx.

The exponential expression is therefore linear when stated in

terms of the logs of one of its members. To illustrate this, take

the fol1 owing expression

In y - a + bx.

We can recognize this as a semi-log straight line. If we were to


-34-

convert to ev'o nell t i I fo rm we .'1"1. ha;ive

If we also examine the etuation

we find that this is another dorm of the exponential which can be con-

verted Lo log form as Follo ws: First, we divide K th sides of the

equation by wh ich gives us

I . ..

h.'e can also express a- a p,,.... of e. :e Look up the natural Log

of -, which, for lack of a better name, we will call -. We can now

write

or, again according to the third law of exponents,

The expression may he further simplified by letting 2), be represented

by the constant, " This produces

Now converting to logarithmic form we have

In ( /.,) = ,, +
-35-

which once again is a linear expression when the quotient of y/a is

given in terms of logarithms.

The equation of an exponential passing through two points may be

determined quite simply. We need only set up the required functional

form in terms of logarithms, then proceed . with a straight line.

The following example illustrates this method.

Given the two points, 1,7 and 4,1 on x," coordinates, we chocse

an expression of the form i = at as the appropriate general exponential

to fit. The next step is to restate this expression in terms of loga-

rithms as follows:

In = in a + x In 1.

With the cooruinates of the two points, (Xl,r ) and (x2, 2) we

may write the two equations:

in Y. = in a + x, In b,

In Y2 = in a + x 2 in h.

Taking the logarithms of y, and 7)2 and substituting the logs '

the us and The xs in the above equations results in two equations with

two unknowns that may be solved simultaneonsly:

1.9459 = in a + I In b,

0.0000 = in a + 4 in b.

Sub :acting the second equation from the first leaves

1.9459 =-3 in ',

in b -- 0.6486.
-36-

This valup can 4.n turn be substituted into the first equation above

with 'he result that

1.9459 = in a - 0.6486,

In a = 2.5945,

and with this we can write the required expression as follows:

in = 2.59 - 0.6 4 9x.

The expression has been evaluated, and the results plotted in Fig. 17

pass exactly through the two pc*nts as required. We can simplify tie

expression by converting it to exponential form. To do this we muii

have the numbers represented by In a aad In I. Looking in a table of

natural logarithms we find that

in a = 2.59 = in i3.4,

in b -0.649 = In 0.523,

and we may write

= 13.4(0.523)x

Notice that 13.4, the value of a in this expression, is in fact the

Y intercept. We can simplify still further by converting the expres-


r d
slon to one in terms of e. To do this we thin'- of 13.4 as e and

0.523 as e; hus.

13.4 = eI,

0.523 = e
-37-

14

12

10

(1,7) or In y = 2.59 -0.649x

6 y = 13.4(0.523)x

or
Y 2.59 - 0,649,,

(4,)

0 X
0 2 4 6 8 10 12

Fig. 17--Exponential fitted through Lwo points

and we find the value of both r and s by again consulting a table of

natural logarithms. Another way to write the last two expressions is

1I ""-,- r - In 13.4

and

s In (0.523.
-38-

We really did not need to use the table again because the Ins of these

values are already available from our calculations above:

s = in 0.523 = -0.649,

= in 13.4 = 2.59.

Now if we substitute er and eS in the equation y = 13 4(0 '?3 )X , we

have

2.5 -O0.649 x
y=(e )(e ) .

Using the first and third laws of exponents we convert this expres-

sion to

y = e (2.59 - 0.649x)

Rewriting this expression in logarithmic form results in

In y = 2.59 - 0.649x,

that is, thr same expression that we had initially.

Power Function

The power function is one of the most commonly used mathematical

expressions in cost analysis because in many cases it adequately

describes the phenomenon of decreasing costs of successive units of

production. The general equation of this function is

y - ax

To avoid confusing the power function with the exponential, which

looks somewhat similar, we must observe the placement of the variable,


-39-

x. In the power function, the variable x is raised to the power b,

while in the exponential, a constant is raised to the variable power

x as below:

y =b x .

The characteristics of the power function can best be illustrated

by initially setting a equal to 1, because a affects primarily the level

of the curve and has little influence on its shape. Having done this

we are left with the equation

For certain values of ?, the function is not continuous for negative

values of x. Therefore, we will restrict the variable x to values

greater than 0; the exponent b can assume any value, positive, nega-

tive, or 0. However, when b is 0, the equation becomes

y = 1,

for any value raised to the 0 power is equal to I. t4hen b is positive

and varies from 0, the family of curves shown in Fig. 18 results.

The smallest value assigned to b in Fig. 18 is 0.2. Had 0.0 been

used, the result would have been a straight line parallel to the x

axis and passing through y = I as abive. When b is bet4een 0 and I

the curves generated are conci'v- dowaward. When b = I a straight line

=x) results, because any number raised to the first power is the

number itself. As values greater than I are assigned to b, the curves

become concave upwards. The curves pass through the point x = I,

I for all values of b, because I raised to any power always equals 1.


-40-

00

CI

-4r

.....
.....
..
-41-

When x is greater than I the curves rotate upwards as the value r,

of i increases. When x is between I and 0, however, the situation is

not the same, as shown by Fig. 19. At the upper end all of the curves

go through the poit z = 1, 1,


I as before, but at the lower end they

all tend towards the point x = 0, = 0. When / is made smaller the

curves become higher, the reverse of what happened when r was greater

than 1. When P is greater than I the curves are concave upwards; when

b is less Than I the curves are concave downwards. As before, when

is equal to 1 the curve is the straight line = .

'en the exponent b is negative, the family of curves shown in

Fig. 20 is generated. Regardless of the value of b when it is negative,

the cur-es are concave upwards. As before, however, each of the curves

passes through the point x = 1, y = 1, and for values of x greater

than I the curves wih the lower values of b lie above those with higher

values of b. When x is less than 1, however, the reverse is true.

When b is equal Io -I we do not have a straight line, as was the case

when b was equal to +1; in this case the resulting equation is

which is o reciprocal or a form of hyperbolo.

Figure 21 illustrates the effect of including , in the equatin .

When z is increaseu from I the curves shift upwards bv dirtct mulLiPli-

cation. 'hen a is r-,duced from I to 0 the curves shitt .imilarly but

in a downward direction.

*[
This is true because a decimal raised to a power greater than
I gives a sniler number. Also, a decimal raised to a positive power
less than I gives a larger number than itself.
-42-

1.0

00.2

0.8

0.7
0.6)

0.6 -

0.9.

0.54.

0.4 1.4

0.3 1 .7

0.1

01
0 0.1 0.1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

x*
-43-

CL

S a)

cc

u
-4-.- -4

HH/

C,

* I I

/ . 4
In most cost analvsis applications only that part of the c urve

where x is 1 is of interezzt. Since we are generally concerned with

depi~cting a cost-quan'.ity relationship with .. the co-st aiid xthe quan-

tity, we are not concerned with the cost of le-s than one unit. But

both because the curve might be useful for other applications 31%!

because of its special behavior, we should become familiar with its

mc)re general characteristics.

The power function also is lin-Oar when expresse-1 in terms Dt

logariltims. Returning to the general equation

and taking the Logarithms of both sides, we have

log =log. + l;
og

whiich is quite cl ear ly a linear express ion in log r i tis sm

curvo-f itt ing techn iques ire simpler when handlIing li near relaition-

slhips, it is 'onmon to make this tr,1nsfor-mation before Tfitiing hilt

power funct ion. F igure 22 snow!- the complete I aitv ot p wer funuc-

tions plOtted usitg logarithmic coordinate,.

"When two I iries ( power lunc onlsl aIre r eIo Io

coox~naesthey ditt er from each other bya constant ratio o per-

centage, which i% kuntrarv to the ci-se wh-ru two para,41.1 lt,!o

Irithrnetic coordi.ates difoer by a constant nurnbetr. 10o Aeinzz~ratc

nnalvt ical lv, aqqune that ,e have two c'urvirs, one .,! Percent i!.hvr

than the other. The equatior-s for these curv: , -

It shou Id a lso be pointeJ out t ha-t h i% curvv ls r.~t~In~


a negative ;-an~d .1r~ ac: e to n!nt.in a > Sit 'vt
direct ion.
9.0
8.0
70 .8
6.0
5.0 -1.6 'Ab 1.6 ' 7l 0L

4.0/

2.0 0.4

i 0.0)

1.0 •
0.9
0.8 -0.2
0.7
0.6 - .
0.5
0.4

0.30.

0.21
t:= -I .6

0.1 I I I
0.1 0.2 0.3 0.4 0.6 0.8 1.0 2.0 3.0 4.0 6.0 8.0 10.0

Fig. 22--Power function y = x on logarithmic coore ,-tes

yU = ax b

2 = I 5ax

and in lgrithmic form

log yl = log a + b log x,

log Y 2 = log 1.5 + log a + 1 log x.

I
-47-

In the second equation let the constant terms log 1.5 + log a be set

equal to log a'. We then have

+
log -I = 1 a b log x,

=
log Y 2 log a' + h log x.

The only difference between the two occurs in the constant terms log

a and log a'. When these two equations are plotted on logarithmic grids

there will be two parallel lines with the spacing between them equal to

log a' - log a or log 1.5.

Fitting the power function through two points can be accomplished

by transforming both variables into logarithms and proceediag as with a

linear case. "o review the method, see the discussion of the straight

line.

p. 3ff.
F4

I. 'ITIING CURVES TO 13 ,-VARIABLE REitATJONSHIPS

THE STRAIGHT LINE

Although there are many methods of fitting straight lines, three

are usually sufficient for fitting curves to two-variable relationships:

the method of selected points, the method of'averages, and the method

of least squares. Of course one can always draw the curves freehand,

but even so the equation of the line must be determined by using one

of the other methcds.

The Method of Selected Points

When it is apparent that data plotted on rectangular coordinates

can be described by a straight line, the equation of that line can be

found using the method of selected poinLs. With the use of a straight

edge, a line is drawn through the points such that the points are uni-

forinly dibtributed around the line. Two points are then read from the

line near the extremities and substituted in the equation

= a + bx.

The two equations are then solved simultaneously for a and b.

In the example shown in Fig. 23 the two points selected were

P1 (4, 8.3) and P 2 (26, 24). The two equations were therefore

8.3 = a + 4b,

24.0 = a + 261.

When the fir:- is subtracted from the second, the result is


15.7 = 22 1,

h = 0.714.

30
0

P2(26, 24)
200

'11-Freehand curve
Y = 5.44 + 0.714x

100
10 -

Pl (4, 8.3)

0 I I
0 10 20 30
x

Fig. 23--Straight line fitted usi the method of selected points

When 1 is substituted in the first equation

8.3 = a + (4) (0.714),

= 5.44.

The equation of the desired line is

y = 5.44 + 0.714x.

When values of x taken from the original data are substituted

in this equation, values of Y corresponding to each of the original


values can be calculated. One way of showing how well the line fits

the data Is to compare the calculated values with the original values

using a percent deviation for each data point as in Table 1. When

computing the percent deviation, it is usual to base the percentages


*
on the observed values. For example, the percent deviation for the

first data point is

100
(8.00 - 6.87) .0 = 14.1.
8.00

Table I

USING THE METHOD OF SELECTED POINTS TO


FIT A STRAIGHT LINE: DATA AND RESULTS

_x (calc.) Percent Deviation

8 2 6.87 14.1

6 4 8.30 -38.3

12 6 9.72 19.0
10 10 12.58 -25.8
14 12 14.01 -0.1

18 if 16.86 6.3

20 22 21.15 -5.8

28 24 22.58 19.4

26 2b 24.00 7.7
22 30 26.86 -22.1
15.9 av

After the percent deviations are calculated for each data point,

an average percent deviation can be calculated by adding each devia-

tion, disregarding the sign, and dividing the total by the number of

data points.

Notice that the placement of the line in the example was quite

arbitrary; this is one of the weaknesses of the method of selected

Percent deviation observed - R calculated 100.


y observed
points. The same equation could have been arrived at in a slightly

different manner. Had the line been extended to the left until it

intercepted the .,axis, we could have read the value of 2 directlw

from the graph and calculated the slope as follows-

= (24 - 8.3)/(26 - 4) = 0.714.

Still another method using the two points given and the modified point

slope formula would be

"- = (X Xl

(24 -8.3\
4
- 8.3 = (24 8.3)• (x - ),
\26 4'

= 5.44 + 0.7!4x.

The Method of Averages

To use the method of averages the data must first be arrayed in

ascending order according to one of thp variables. Second, the numbers

are divided so that two approximately equal groups are formed. If the

number of data points is even,, there should be an equal n umber in each;

if odd, the extra point will have to be placed in one groul, or the

other. The average wlue of each of the variables is calculated lor

each group and substituted into tLe equation

L' + i X.

Two equations result as before; these are solved simultaneously for

,z and ,
In the example shown in Fig. 24 and Table 2, the data are ordered

according to the variable x. As there are an even nunber of data

points (10), 5 are to be assigned to each group. When average values

for both x and y are calculated for each group, they define the two

points (6.8, 10) and 23.6, 22.8) which are then plotted and a straight

line connecting them drawn. The equation of the line is obtained by

substituting the coordinates of the average points into the equation

- a 4 Fx as before and solving the two equations simultaneouslv for

a and b.

10 = a + 6.8,

22.8 = a + 23.61.

20 - (23.6, 22.8)

CYrye dra;,'n hrough P1 and

= 4.82 + 0.762.r

10 -

Xo p (6.8, 10)

F 10i 20 30

Fig. 24--Straight line fitted using the method of averages


When tlc first is subtracted from the second, t-e result ;5

12.8 = 16.8b,

b = 0.762;

and substituting I in the first equation,

10 ::a + (6.8)(0.762),

,' = 4.82.

Therefore,

y = 4.82 + 0.762x.

Percent and average percent deviations as calculated are shown

in Table 2.

Table 2

USING THE METHOD OF AVERAGES TO FIT A


STRAIGHT LINE: DATA AND RESULTS

j . / , (calc.) Percent Deviat ion

2 6.3, 20.7
6 4 7.87 -31.2
12 6 9.39 21.7
10 to 12.44 -24.4
14 12 13.96 0.3

10 av 6.8 av -- --

18 16 17.01 5.5
20 22 21.58 -7.9
28 24 23.11 17.5
26 26 24.63 5.3
22 30 27.b8 -25.8

22.8 av ___ _ av
23.6 8I
23.av_........ -! 6.0 av -
:- 4-

A sl.ghtlv different version of the method ot averages is some-

times used. Instead of calculating average values as was done befire,

the data are used to establish ten separate linear equations as follows:

8 = a + 2!) 18 = a + 16b

6 = a + 4b 20 = a + 22K

12 = a + 6b 28 = a + 24i

10 a + lOL" 26 = a+ 261K

14 = a + 121, 22= a+ 30i,

50 - 5a + 341 114 = 5c + 118!,

These groupings are preserved and the equations apptaring in each

group are added to nbtain the required two equations which are solved

simultaneouslv for a and K:

50 = + 14,

114 = 5,z + 118,.

Subtracting the first from the second,

b4 - 84,;

- 0.762;

and slibstituting , in the first equation

50 - . + (34'(0.762,

- 4.82.

The same solution results from either averagiug or adding. Although

the method of averages is simple to usc and does give a reproducible


solit if i J e!4 .'0 ensure that a ,eet fitting straight line will

he chosen.

Ihe "ethod of 1,ets t -Iua res

Fhe mutho d .t leasc srllres is probably the most widelv used

me thod of obtaI i ni n-oi erni r i -,q ~'. tios ~. ;c.t~,'


uares"

re lects the Cr it er ion used o deter-, ,ine the des ired equat ion. !'he

I ine is chosen suich that the suim of the squared deviat i,ns of the

points from the 'ine is iinimized. Fhe way thi s criterion is used

in the formula tor ca,'lat ing a lost-sqiores ioT., is worked out in

full in Appendix A. Brief Iv it is as follows:

We are seekng an eqvrat ion of a straight iine

4-...

such that tih' :;uM of t' -. square. d istanses of tihe daita fror: tha line

will )e minimal.

'lie sum of tChe sqrrrred dcviHtions ) is sed ntmati ca Iv×pres


*v

as

obta in valu,,s t anid so trati exnr'ssrior can


n in ze,

it is ntiessarv to take part ial d, r vat ives with respt tH

and and to equil te them to U. 1hese pa, i.1 drrivali-e-i are

.ere the equat iot is of the form , .i j?, the distances are
measured parallel to tiue axis. Coenversely, where the equation s ol
the f,,rm x - a + " , the distances are measured parallel to he x axis.

This is a method of cal cu lus included here only as a ,-rtter of


Interest. Its comprehension is not essential to anvth'lig that fo:lows
in this text.
5-36

Ed2 2
.22 2 z. + 2a x + 2i X 0;

ana with the obvious simplifications made, the resultant two equations

ia= Xla
L

74 U
(7X + X

are called the normal equations for fitting a least-squares straight

line. Each of the values other than a and b can readily be deterimined

from the data, leaving two equations with two unknowns that can be

solved simultaneously.

In applying this method it is convenient to array the data as in

the first two columns of Table 3. (The ordering of the values is not

essential, although it tends to make checking the calculations easier.)

The next step is to square each entry in the column headed x and enter

Table 3

USING THE METHOD OF LEAST SQUARES TO FIT


A STRAIGHT LINE: WORKSHEET
2 Percent d
Y x x y (calc.) d Deviation I

8 2 4 16 7.07 0.93 11.7 0.v65


6 4 16 24 8.49 -2.49 -41. 6.200
12 6 36 72 9.90 2.10 17,5 4.410
10 10 100 100 12.73 -2.73 -27.3 7.453
14 12 1/' 168 14.14 -0.14 -1.0 0.020
18 16 25b 288 16.97 1.03 5.7 1.061
20 22 484 440 2'.21 -1.21 -6.1 1.464
28 24 576 672 22.63 5.37 19.2 28.837
26 26 676 676 24.04 1.96 7.5 3.842
22 30 900 660 26.87 -4.87 -22.1 23.717

164 152 3,192 3,116 16.0 av. 77.869


-57-

the product in the column headed X. The entries in the column headed

-7 are the products of the x and values for each point. After these

calculations have been made, each column -s totaled as shown. .','


is

the number of data points; the remaining values equired by the normal

equations are the totals of the appropriate columns. When these are

substituted in the normal equations, we have

164 = 10 + 152,

3116 = 152: + 31927

and when these are solved simultaneously,

= 5.66

= 0.707.

The equation is therefore

. = 5.66 + 0.707x.

Figure 25 shows the data and the straight line described by this

equation. The average percent deviation is calculated as before. An-

other measure of goodness of fit that is often used, particularly in

conjunction with least-squares, is the standard error of the estimate

of y, S. This measure is obtained by squaring each of the deviations,

adding the results, dividing the total oy N (the number of data points),

and taking the square root of the asult, e.g.,

Y N-2

The standard error of the estimate of y allows a heavier penalty


for extreme data points than does the average percent deviation method,
yet it is more difficult to interpret.
-58-

30

20

S "-Least squares fit


= 5.66 + 0.707x

10 -

10

0 I I
0 10 20 30
x

Fig. 25--Straight line fitted using the method of least squares

In our example

- ± 77.869
t 8

S = ±3.120.

Since the standard error of the estimate of y is expressed in the

same units as the variable y, it is often better, when making comparisons,

to convert to a non-dimensional number. This is usually done by dividing

S by the arithmetic mea.. of y, W:

V (S/y)100,

or

V= 312 100;
16.4
19o02%,

where is called the coefficient of variation.

Sumar

Of the three methods for fitting straight lines the method of

selected points is the easiest to use, but when the required line is

not obvious from the data, the choice is strictly a matter of judg-

ment and the results are not easily reproducible. The method of aver-

ages provides a relatively simple and yet definitive way of choosing

an equation (although not necessarily the best fitting line). The

only place where uncertainty 2nters the picture is in the placement

of the odd data point. The method of least squares requires more in

the way of calculation but provides a completely unambiguous way of

selecting and fitting the line; for that and other reasons it is the

most widely used method.

An average percent deviation can be used to show how well the line

fits the data. It is easy to calculate and to interpret. The standard

error of the escimate of y, while requiring more calculations to be made,

is more widely used because of its implications in drawing statistical

inferences. The coe'ficient of variation, a non-dimensional term based

on the standard error, is useful especially for making comparisons.

Although the results of applying the three different methods to

the same data, as displayed in Fig. 26, are strikingly similar, the

user cannot expect that this would always be the case.

T11E PARABOLA

The same three methods used to fit the straight line may be used

to fit the parabola. To lit the parabola, both the method of selected
44-

4.

C C

4.j:
Q 0 OL
U) -4

mJ zc C'~

-.1 0 .4

c~ 'C

0.

c-o
points and the method of averages rely even more on the judgment of

the cost analyst than was the case with respect to the straight line.

For thts and other reasons, the method of least squares is preferred. j
Measuring goodness of fit is the same as it was for the straight line

and will not be discussed again. There are two forms of the parabola

which mray be used by the cost analyst. The first is

£
y = a + 7~x + C-x

which is illustrated in Fig. 27. The second is

2
x = a + by + cp

or, in term s of y,

-b t b2 - 4c(a - x)
Y =2c

which is Illustrated in Fig. 28.

2
Cx
a + hx +

Fig. 27--Parabola form I


-62-

b 12- 4c(a x)
2c

Fig. 28--Parabola form 2

In the first case, the line of symmetry of the parabola is parallel

to the u axis and in the second it is parallel to the x axis. Each form

will be examined separately. Form 2 requires y to be the squared term.

In order to accomplish this, the equation of the first form is written

interchanging y and x as follows:

2
x = a + by + cy

and solving for y using the quadratic formula

2
cy + by + a =O,

-b ± Vb 2 - 4c(a - x)
y 2c

Parabola Form I

The Method of Selected Points. When it is clear that the rela-

tionship described by the data can be represented by a segment of a

parabola, the method of selected points may be used. First, draw a

freehand curve roughly the shape of a parabola through the data, and
-63-

read three points from the curve, two at the extremities and one some-

where in the middle. If there is a relatively sharp maximitm or mini-

mum point (vertex), it would be best to read the third point from the

area of the vertex. Then substitute three points in the equation

= a + bx + cx
2

and solve the resulting three equations simultaneously for a, b, and

c. Notice that x appears twice in each equation, once as it is read

from the curve and once squared.

In -he example shown in Fig. 29, numbers for which are given in Table

4, we draw the freenand curve indicated by the solid line and read from

Table 4

USING THE METHOD OF SELECTED POINTS TO FIT THE


PARABOLA FORM 1: DATA AND RESULTS
x (calc) Percent
caDeviation

3 1 2.00 -33.3
2 3 2.99 50.0
4 4 3.56 -11.0
5 8 6.37 27.4
7 10 8.08 15.4
10 11 9,01 -9.9
12 14 12.13 1.1
15 17 15.70 4.7
18 18 17.00 -5.6
17 19 j 1 .35 7.9

-- _ 16.o av

Lhe curve the points (18, 17), (12, 10) and (1, 2). The required

three equatons are


-64-

324
17 = a + 18b + c',

10 = g + 12b + 144 c,

2 =a+ Z~+

When we subtract the second from the first and the third from the

Rcond, we get

7 = 6b + 180e,

8 = 1Wh + 143c.

20
/
/
o/

(18, 17)

15 -

Freehand Curve

1 1.582 + 0.391
+ 0.0259.
/P (12, 10)

,
3Ile oj/

, 0

*x

0 5 10 15 20

0 -

Fig. 29--Parabola form I fitted usinR thle merhod of -,elec'tOc points


-65-

To eliminate 1,, we multiply che first equation by 11 and the

second by 6, then subtract the second from the first:

29 = 1122 ,

= 0.0259.

Substituting c- in the first equation,

7 = 6, + (180)(0.0259),

I = 0. 391.

Next, we substitute both ,; and in the third of tile original

equations, and calculate ,- as follows:

2 = a + ; + .',

a - 2 - 0.391 - 0.0259,

. 1. 582.

The required equation is therefore

- 1.582 + 0.391x + O.l2 59X

The graph c this equat ion is shown by the dashed line in i ii . .Q.

As with the straight line, the chances are small that two analysts

independent v using this method would arrive at the same result. Th e

only argument in favor ot it is that it is relatively simple to use.


The Method (ifAverafay s. 'hree point,; sir.,!ar to thost obt ai ed

in the methcd .f selected points by arhitrari lv choosing them from a

The discrepancies in the arithmetic result from the fact t6at


more decimal places than those shown were used in making tl " actual
calculations.
freehand curv may be obtained by averaging. In this method, we array

the data in ascending order of one of the variables, form three groups

of approximatoly equal size, and calculate the average values of both x

and for each group. We substitute the average points in the equation

,I + !)X + -. r"

and solve the resulting three equations simultaneously for a, b, and c.

Table 5 illustrates the procedure for the example case. The 10

numbers are arrayed in ascending order according to the value of x.

Three groups are forme,- by assigning 3 points to the first and last

groups and 4 points to the middle one.

Table 5

USING THE METHOD OF AVERAGES TO FIT THE


PARABOLA FORM 1: WORKSHEET

(calc) { Percent
Deviation

3 1 2.34 22.0
2 3 3.15 -57.5
4 4 3.64 9.0
3 av 2.67 av

5 8 6.20 -'4.0
7 10 7.83 -11.9
10 it 8.73 -12.7
12 14 11.78 1.8
8.5 av 10.75 av

15 17 15.36 -2.4
18 18 16.67 7.4
17 19 18.04 -6.1

16.67 av 18 av -- 15.5 av

(The assignment of the odd point is arbitrary.) By averaging, we obtain


-67-

three points, (18, 16.67), (10.75, 8.5), and (2.67, 3), and plot them

as shown in Fig. 30.

20

P (18, 16.67)
15

2
2.018 + 0.290.r + 0.0291X

10 0

P (10.75, 8.5)

2.67, 3)
(

0
50 15 0

Fig. 30--Parabola form I fitted using th, metiod of averages

We substitute these same points in the equation

- + + ",

which results in the three following equations:


3.00 = + 2.67; + 7.129,

8.50 = + 10.75,; + 115.563c,

16.67 = a + 18.00i + 324.000c.

When these equations are solved simultaneously, the result is

a = 2.018,

= 0.290,

= 0.029!,

The equation of the requir-" parlbola is therefore

=
9 2.018 + 0.290r + 0.0291x 2 .

This method is less arbitrary than the method of selected points,

but some a...biguity does exist because of having to assign any odd data

point to one of the three groups. Further, averaging may prevent us

from cl osing a point near the vertex of the curve.

The Method of Least Sauares. lo fit a parabola using the method

of least squares, we must solve three normal equations simultaneously

for the values of the coefficients a, b, and c in the equation

2
= a + hx -4-ox

The normal equations are

x + c 2
Y = Na +
2 3
xy = a x + 1 x + C 3'x
2 a x2 x3 4

~xi~~x +~'x
+lx
-69-

The least-squares criterl " and details of the derivation of these

equationA, which are described fully in the Appendix, are similar to

those for the straight line. As in previous 2iscussions, all of the in-

formation required to solve the equations can be obtained from the data.

In a worksheet similar to the one shown in Table 6, we array the

data as in the first two columns, making entries in the other columns

after performing the calulations indicated by the headings. . is the

'Fable 6

USING THE METHOD OF LEAST SQUARES TO FIT


THE PAr ABOLA FORM 1: WORKSHEET

2 3 J 4
xx
2 Percent
Deviation

3 1 1 1 1 33 2.2F 24.0
2 3 9 27 81 6 18 3.04 -52.0
4 4 16 64 256 16 64 3.51 12.0
5 8 64 512 4,096 40 320 6.03 -20.6
7 10 100 1,000 10,000 70 700 7.66 -9.4
10 11 121 1,331 14,641 110 1,210 8.58 14.2
12 14 196 2,744 38,416 168 2,352 11.69 2.6
15 17 289 4,913 83,521 255 4,335 15.37 -2.5
18 18 324 5,832 104,976 324 5,832 16.72 7.1
17 19 361 6,859 130,321 323 6,137 18.13 -6.6

93 23,283 386;309 1,315 120,971 15.1 av

number of data points and the column totals provide the other neces-

sary inputs to the normal equations. The equations to be solved for

the example shown in Fig. 31 and Table 6 are:

93 - 10a + 105b + 1481c,

1315 - 105a + 1481b + 23283c,


- 70-

a09-7 1 *1481',z + 21283!1 + 8i0.

Theref arc

1.993,

3=0.254,

0.0314.

20

Is

2
0 1993 + 0,.'54x + 0.0314x
y 10

5 10 15 20
0

of least squares
Fig. 31--Parabola form 1 fitted using the method
-71-

Thus the desired equation is

2
1.993 -t 0.254X + 0.0314x

The method of selected points, the method of averages, and the

method of least squares can each be used to fit a parabola. H.... r,

because th , method of least squares results fn an unambiguous solution,

it is usually preferred.

Figure 32 shows a comparison of the results obtained by usng

each of these methods to fit a parabola to the example data.

20

Selected Points

16

Averages
12

Least Squares..

40

* Data Points

0 I I I I
0 4 8 12 16 20
x

Fig. 32--Fitting a parabola form 1 using three alternate methods


-72-

Parabola Form 2

The equaticai for this class of parabolas is

2
x - a + by + cy

The only difference between this and the equation for the parabola

form I is that x and y have been interchanged. In fitting this farm

we invert the relationship between x and y in the data and proceed as

before until a, b, and c have been calculated. At that point the equa-

tion in y as above must be solved for y using the quadratic formula.

The desired result will typically be one of the two possible solutions;

the appropriate one can best be determined by experimentation. WhIle

all three curve fitting methods can be used here also, we shall only

illustrate the method of least squares.

The Method of Least Squares. The normal equations necessary to

fit this kind of parabola are the same as for form 1 but with x and y

interchanged as

x ia + b
h Yy + y

-ya y + b I y + C Y

2 a d + b I + c y4

When the appropriate values are calculated as shown in Table 7

and substituted in the above equations, we have

92 - lOa + llOb + 1530c,

1370 - llOa + 1530b + 23690c,

22076 - 1530a + 23690b + 387858c.

! 0
-73-

Table 7

USING THE METHOD OF LEAST SQUARES TO FIT


THE PARABOLA FORM 2: WORKSHEET

x ,X
....-- 4 x" _ (calc)
(cal Percent
. ...... . . Deviation

2 4 8 16 2 4 0.76 62.0
4 2 16 64 256 8 32 3.99 0.3
6 3 36 216 1,296 18 108 5.80 3.3
8 5 64 512 4,096 40 320 8.40 -5.0
10 9 100 1,000 10,000 90 900 12.13 -21.3
12 6 144 1,728 20,736 72 864 9.46 21.2
15 12 225 3,375 50,625 180 2,700 14.33 4.5
16 16 256 4,096 65,536 256 4,096 16.84 -5.3
1' 18 324 5,832 104,976 324 5,832 17.97 0.2
19 20 361 6,859 130,321 380 7,220 19.04 -0.2

110 92 1,530 23,690 387.858 1,370 22,076 --- 12.3 av

The solution is

(z = 0.913,

b = 0.0783,

= 0.0485;

and the equation sought is

x = 0.913 + 0.0783y + 0.0485y

Since our objective is to use x to estimate ,, we must solve this

equation for ,. We can do this by writing it as . quadratic equation

in :.:
-74-

2
0.0485y + 0.0783y + (0.913 - x) 0,

and using the quadratic formula we obtain

-0.0783 ± /0.07832 - 4(0.0485)(0.913 - x)


2(0.0485)

which simplifies to

-0.0783± -0.171 + 0.194x


S= 0.0971

We can see on inspection that the solution in which the square-root

term is negative is not u.:ful; the correct equation is

-0.0783 +V--0.171 + 0.194x


= 0.0971

This equation has been graphed in Fig. 33. The reason for selecting the

form 2 parabola is that as larger and larger values of x are used, the

value of y continues to increase--at a decreasing rate, however. Such

would not have been true had a form I parabola been used. This problem
,
is discussed in the section of this Memorandum on analytic geometry.

THE EXPONENTItL

In its simplest foimi, the equation of the exponential is

or

Y lox ,

depending on whether bass e or 10 is preferred. (Recall the earlier

discussion of the advantages of each.)

see op. 19-22.


-75-

20 .

'A

151

-0.0783 + vrO.11TT4 T
0.0971

10/

5/

0 5/0152
IX

0U 10 15 20

For outtr
pqurpions alwmore usieflrep o te exonentale ishe

than1,
epedin on he alu ofa, n oaceeae tart
greater or less than x, depending on the value of b. For illustrative

purposes, we will work exclusive'.y with

although those who prefer to use base e ray do so, sin,:e the procedures

are the same.

Unfortunately there is no direct least-squares solution for fit-

ting a curve of this type. There are iterative methods that can be

used to approximate a least-squares soluticr, but they requlre a Lrge

computer to be of practical use.


,The usual method is to transform the exponential into a linear

equation by taking The logarithms of each side as follows:

log = , + ;X.

We then substitute the logarithms of tLe values for the actual

values and employ the least-squares normal equations for fitcing a

straight line. It should be noted, however, that this method does not

yield the same least-squares solution for a and i,as the exponenli

form does. The criterion of le.st squares is applied to the logarithms,

not to the actual values of ,. which resolts in minimization of the

relative rather than the absolute deviations." The fitted line is

also higher than would be the case had the least-squares criterion been

C. A. Graver and H. E. Borer;, .k, : t:' ':: .' ".". ' a. * .-


z , . , The RAND) Corpor t ion, RM-4879-PR, July 1967;
in this Memorandum, ihe term "exponential" appliet4 to the power form
used in this text.
**
loid. This approach is -,'ne when one wants to minimize relative
rather than absolute differences. One could argue that such is the
case for ost cost-analysis probiens.
-77-

applied to the actual values. A similar phenomenon occurs when the

method of averages is used.

To illustrate the least-squares method we apply it to Lhe daLa in

Table 8, Notice that the first step is to obtain the logarithms of the

y valupq. From that point on, the calculations required are as iadl-

cated in the column headings. The normal equations are the qame as

for Lhe linear case with log y substituted for 4:

log = "a + x 9w

.r log =a + x'2

Table i

USING THE METHOD OF LEAST SQUARES TO FIT


THE EXPONENTIAL: WORKSHEET
(semi-log form)

Percent
Y ,
log ,ilog x xlog . log :, (calc) 1- (calc) Deviatin

30 1.4771 1 1 1.477 j 1.387 24.4 18.7


19 1.2788 2 4 2.558 i..280 19.0 0.0
15 1.1761 3 9 3.528 1.1?) 14.9 0.7
10 1.0000 4 16 4.000 !.0O5 11.6 -It.I
9 0.9542 5 25 4.771 0.958 9.1 -1.1
0.7782- 6 136 4.9 0. 851 7.i1.
5 0.6990 7 49 4.893 0.744 5.5 -10.0

4 0.6021 8 64 4 816 O. 3 -7.5


0.6021 9 81 5.419 0 .529 3.4 15.o
3 0.4771 101100 4771 0 2 13.3

105 9.0447 55 385 1_40.902 10.1 av

Substituting the appropriate values from Table 8ields

9.0446 - I1_i+ 554,

40.902 - 55.: + 38S:,


-78-

which when solved result in

- 1.494,

= -0.1072.

The desired equaticn is therefore either

log = 1.494 - 0.1072x,

or

Si01.494-0.1072x

The graph of this solution is shown in Fig. 34. Because only the left-

hand member of the equation is expressed in logarithms, this solution

is often called the semi-log form.


When the log transformation of y is not entirely sufficient to

K! straighten out the data, adding or subtracting a constant from the

value of y may help. The equation that results when the constant is

used is

I loa+bx

or

a+bx +
Y 10

and in semi-log form:

log (Y- a) -a + bx.

The value of the constant can be found by trial and error, but is more

conveniently estimated using the following procedure. The data are

plotted as in Fig. 35 (a) and a freehand curve is drawn. Three points


0 -

/* .

0 0

0 'C

-4I
-80-,

A .0

AA

nrU- 0 r- A % I4

0 A0
A+a
A1

0 A '.

%00c A0 1 1 D
tA
I
-81-

are selected such that two lie at the extremities of the curve and the

third lies halfway between. If the first two points have coordinates

X and x then x will be equal to (x1 + x ) 2. The

coordinates for each of these points are read from the curve aud sub-

stituted in the equation

3
i"1
2 - '3

and a is estimated. See Appendix B for the derivation of this formula.

To illu.trate, the three points read from the curve in Fig. 35a are

P1 = (a1, '' = (1, 20),

P2 = (x2 ' 2) = (10, 3),

P3 (3 3) = (5.5, 7.2),

and

(29)(3)- (7.2)2
= 29 + 3 - (2)(7.2)'

= 2.0.

The value of a is subtracted from each value of u in the data

and the logs of (z4- ) are determined. The two steps are shown in

Table 9. From that point on, the steps are the sanmc as used in the

semi-log or exponential case. When the appropriate values are cal-

culated and the normal equations solved, the results are

a = 1.559,

h = -0.1522,

a = 2.00,
-82-

Table 9

USING THE METHOD OF LEAST SQUARES TO FIT THE EXPONENTIAL


WITH THE CONSTANT a: WORKSHEET
Y (Y-----og(i -A) 2 I2-
Ix log(y-a) log(y-a) e (y-a)U YC Percent
Deviation
y (y-cz) log\4-.A) t X x xiDvito

30 28 1.4472 1 1 1.447 1.407 25.5 17.5 8.3


19 17 1.2304 2 4 2.461 1.255 18.0 20.0 5.3

15 13 1.1139 3 9 3.342 1.103 12.7 14.7 2.0


10 8 0.9031 4 16 3.612 n A50 8.9 10.9 -9.0
9 7 1 0.8451 5 25 4.225 0.798 6.3 3.3
6 4 0.6021 6 36 3.612 0.646 4.4 6.4 -6.7
5 3 0.4771 7 49 3.340 0.494 3.1 5.1 -20
4 2 0.3010 8 64 2.408 0.342 2.2 4.2 -5.0
4 2 0.3010 9 81 2.709 0.189 1.5 3.5 12.5
3 1 0.0000 10 00 0.000 0.037 1.1 3.1 -3.3

105 7.2209 27.156 ...... .. 6.19 av

and the estimating equation is

log (y - 2.00) = 1.559 - 0.1522x

or

y = i01.559-0.1542x + 2.00.

The results are shown plotted on arithmetic grids In Fig. 35a,

and (y - a) and y are plotted on semi-logarithmic grids in Fig. 35b.

The extent to which the addition of the constant a improved the situa-

tion can be seen by comparing the average deviation of 6.19 calculated

using the constant a with an average deviation of 10.2 calculated in

the straight semi-log example. The same data were used in both cases.
-83-

THE POWER FUNCTION

The general equation of the power function is

= 'lx

As was true of the exponential, there is no direct least-squares

solution for fitting the power function. Iterative methods can be used

to achieve quite close approximations, but require such extensive cal-

culation that they are only practical when a computer is available.

The usual practice is to transform the power function by taking the

logs of both sides as

log 7 log a + h log x.

The result is a linear equation in terms of the logarithms of both x

and y. When this transformation is reflected in the data by substi-

tuting the log of y for y and the log of x for x, the appropriate

values for the example shown in Table 10 and Lg. 36a may be calculated

and used in the normal equations for a straight line as follows:

log y log a + log X,

log x log y a log x + b ) log 2 £

9.6245 = lOa + 10.87861,

8.5378 = 10. 8 786 a + 14.34601,

log a = 1.79q4,

1,= -0.7694.

i(Craver G and Buren, p. 75.


-84-

We must recognize that, as before, the result is a least-squares fit

in terms of the logs rather than the actual values of y. The line will

be placed ;uch that the relative, not the absolute, deviations have

been minimized.

Table 10

USING THE METHOD OF LEAST SOUARES TO FIT


THE POWER FUNCTION: WORKSHEET

l2 Percent
yog X I log x log x log x log log c Deviation

50 1.6990 2 0.3010 0.0906 0.5114 1.5678 36.98 26.0

25 1.3979 3 0.4771 0.2276 0.6669 1.4324 27.07 -8.3

20 1.3010 5 0.6990 0.4886 0.9094 1.2617 18.27 8.7

13 1.1139 6 0.7782 0.6056 0.8668 1.2007 15.88 -22.2

10 1.0000 10 1.0000 1.0000 1.0000 1.0300 10.72 -7.2

6 0.7782 15 1.i761 1.3832 0.9152 0.8946 7.84 -30.7

6 0.7782 20 1.3010 1.6926 1.0124 0.7984 6.29 -4.8

4 0.6021 40 1.6021 2.5667 0.9646 0.5668 3.69 -7.8


3 0.4771 50 1.6990 2,F866 0.8106 0.4923 3.11 -3.7

3 0.4771 70 1.8451 3.4044 0.8803 0.3798 2.40 20.0

140 9.6245_221 10.8786 14.3459 8.5377 --- --- 13.94 av

The estimating equation expressed in logarithmic form is

log y = 1.7994 - 0.7694 log x.

The same equation expressed as a power funct on is

= 63.01x
-0.7694

As was the case with the exponential, the addition of a constant


-85-

100 -(a) arithmetic scales

80

60

0 7691
40 u =63.01x-

-,L6,1-'79
20

0 _ I
0 20 40 60 80 100
x

100
80 (b) logarithmic scales
60

40
30
0
20

log y =1.7994 -0.7694 log x

6
5
4

1 24 56 789 10 20 3 4050 70 100

Fig. 36--Power functicn fitted using the method of least squares


-86-

to the equatior

ax

as in
b
ax + c,

can often help in using the power function to describe a relationship

as slightly curvilinear in terms of the logs of both x and y.

In log form the equation including the constant axis

log (. - = log a + log x.

Although the constant , can be determined here by trial and error,

it is more conveniently estimated by much the same formula as for the


,
exponential case:

Y14 2 - 3
+
YI :j - 2 3

In this case, it is easier to plot the data on logarithmic co-

ordinate paper, and to draw the smooth curve as before. We select

three points failing on the curez. two at the extremities and one in

between such that its x coordinate is the geometric mean of the

coordinates of the other two points, as

x3 - 2"

The entire procedure is illustrated in Table 11 by Fig. 36(b), 37(a)

and 37(b). The extent to which the addition of the constant a improved

the result can be seen by comparing the average deviations.

See Appendix B.
-87-

The calculation of , is as follows:

PI = (XI, Yl) = (2, 46),

P2 = (x 2 , Y 2 ) = (50, 3.2),

x 3 = vXlI 2 = 100 = to,

P 3 = (X3 y3 ) (10, 9.5),

9 127 - 32

+
1 2 3

(3.2)(46)- (9.5)(9.5)
= 3.2 + 46 - 2(9.5)

A= 1.89.

The normal equitions and their solution are given below:

V log (:4- = .a + l og x,

(y - ) = a log x + (log x) 2
(log x)[ (log

7.9013 = 102 + 10.8785!

8 78
5.9598 - 10. 5a + 14.3457b,

a2 1.9318,

K = -1.0494.

The equation is

log - 1.9318
t - 1.0494 log x,

Set- example, p. 78.


1 -88-

or as a power function,

-83.35x 1 0 4 9 4 +1.89.
-89-

C 'C .f (. .~ . .~ .N .~0

u ~ - .c7o. N 0 -7a 0
."
2

(N. r-. fn C'4 (- -


r~N(. 0
~~n~ a ~~uLn 0
Z eMk wC

I
a, r-- aM7 ' T LJ n M*~ IrN

0 a', r - C 'C) - -7
r''C Le's
E-. -7 10 -
00 ON r- r
S C ; 0'l C D* -cN (N o 0 I

zr '- z D - 1 l

- Cj 3
00
z ' - .7 L(

-4.

X OI

AM A

'- - (- - '- 0 -' -C -( 0 -

z -0 (
.f - 0 7 3' - (- -

- A - 0 00 000

-0 '4 1 m c*
100
80 (a) logarithmic scale

60

40 l (, - 1.89) - 1.9318- 1.0494 log .r

30 -

20 -

10
8
6

2N

.8

.6
.5 I I I I I I I I I
2 3 4 5 6 8 10 20 30 40 60 80 100
x

100 (b) arithmetic scale

60

41)i
ILI .: - 85. 3cr 1.8Q

0 20 40 60 80 t)A

37
Fig. -- Power functiotr with cons-ant i. fitced
using tht method of least gquares
-91-

111 THRLE-VAR lABEL ER FITTING

IHE LINEAR CASE

Ani empirical eqUat iOn Used tc describe a thrce-varvabl c linear

relI t ionshlip has the enera I form

. = a + b +),b,'+

which is a simple e.-tenSiOn1 Of the two-variabit linear is-tr.i,h ne

equat ion previousi V; d isc-, -d Fo be consistent wi th the throc-variable

equa-t ion, we will wri te the two-variable relationship as:

We have already learned that the constant- term, a, %.as the value of

when was eqal3 to 0I. C"e f ur the!- Iearned' tha t bwas colI led

the, slope oif the straight lint, and that , i;evendinlz wctw b a

Posit i ye cr nevative, thec va I ie of 'OUtt-M- 110J tht. Xt ent towhh

won I d he i ncreaseco or dtcreaosed vith i

iot' ree-var jab 1e reatonhi mdve thoug ht oI as' Iw


wo-

Varijabl1e relationships i nt: ra, I ng with each other. F 'r e~~

'And

are tw'separat e tw-ai.O! 1 inva r rel1a, ionsh ips ,r firs


T do-

cr 1 i nc the inpact of on the'..vlotq of an tescndtheI!pi

'4 ~ ' ~ nn ie f irt rea ee~ I~hp


h ext ont t o

W'hen writIng mult i-ariable equat ions, it is con vont l onal to


uise the subscripted x. a5 abo ve rathe, han a ndi x as ha,; been done
previouIS V.
-92-

influences the value of X is not accounted for, nor, in the second,

is the extent to which X 2 influences the value of X What we really


need is a relationship between ' and ' and between I and . where
1 '2 "1

in each case the effe-t of the other independent variable on has


i.een elimi:ated. Assuming that it is possible to obtain these, we

write

I ai. 23 + b12 .3: 2

and

1 1.23 + b13.2' 3

where subs ipts indicate the variable whose effect has be,: n r'Liminated.

In the equation above, a is identified by the subscript 1.23, indica-

ting that a is the value of X1 , once the effects of X 2 and X3 have been

-li.minated. F5nce a is a constant, the relation ')ip is a simple one;

when Y2 and X3 are eliminated from consideration, X is in fact equal

to a1 .23. In this equation the slope b is subscripted 12.3, indicating

that b12. is the net slope of the relationship between X, and X in-

dependeiit of the impact of X The numbers to the left of the decimal

point in the subscript identify which two variables are being related;

those to the right identify Lhe variables whose effects have been

eliminated. The subscripts used follow a logical pattern, and in fact

this scheme of subscripting is often extended to four or more variable

relationships.

Now, given that in each of the two straight line relationships

showns above we have pure relationships (net relationships) between

X I and each of the independent variables, and given that the two
-93-

independent variables completely determine th2 value of 'X0 it is

proper to combine them to wrile

1. = al + b + 2"3
i 1 .23 b12.3'2 b 3.
2 '
3

this is the three-variable linear relationship with which we began.

The coefficients of X, ard Y., b and b are frequently referred


2 ' 12.3 1 .2
to as net regression coeff'7ients and are in fact the slopes of the

two separate straight lines described above. Each describes the im-

pact of its accomoanying variable on the dependent variable X The

constant a1.2 3 is simply interpreted as the value of 'I ?hen both .

and X3 are equal to 0.

To explore the idea of a net regression coefficient further and,

at the same time, to illustrate one way that this type of relationship

can be fitted to actual data, we will use the following example. In

this case, we will begin with the answer and use a curve-fitting tech-

nique to see how closely we can reproduce it.

Assume that we are going to publish a technical report and we

are concerned about the cost consequences of including various com-

binations of illustrations and plain printed pages. We contact a

number of prospective prinLers and find that, on the average, for each

report printed, there are three charges: a fixed charge of $1.00;

a charge of $0.10 per illustration; and $0.04 per printed page. The

charges may be more concisely stated in the following three-variable

linear relationship:

C $1.00 + $0.10r + $0.04P,


-94-

wh,ere = the cost per report,

= the number of illustrations per report,

= the number of printed oages per rep--t.

At this point, we arbitrarily select a number of possibilities,

choosing some with differing numbers of printed pages and a fixed

numbec of illustrations and others with varying numbers of lilustra-

tions and the same number of printed pages. Further, for each com-

bination chosen, we use the above cost equation to determine what it

would cost to print the particular report. We select twelve reports

as shown in Table 12, each with a different combination of illustra-

tions and printed pages, and determine the printing cost of eaich.

Table 12

DATA ON SELECTED REPORTS

R - No. of No. of Cost to Print


Illustrations Printed Pages per Copy ($)
(I) (P)_(C)

1 1 18 1.82
2 2 4 1.36
3 2 10 '.60
4 2 20 2.00
5 3 15 1.90
6 4 13 1.92

7 5 7 1.78
8 5 16 2.14
9 I 6 6 1.84
10 6 2 1.68
11 7 1 1.74
12 .7 7 1.98
-95-

Taking Report No. 3 as an example, we can see that the cost of $1.60

is arrived at as follows:

Fixed charge .. ......... .. $1.00

Illustrations (2) @ $0.10 .20

Printed pages (10) @ $0.04 . .40

Total ..... ........... $1.60

Let us now assume that, instead of having the equation which

allowed us to calculate the costs above, we have only the data contained

in Table 12 and we wish to find the equation. In such an example

(which is unlike the usual case) we will assume that we know the price

to be influenced only by the two variables, number of illustrations

(I) and number of printed pages (P).

As has been our practice in the past ir ittacking such problems,

we begin by constructing scatter diagrams, but, because it is difficult

(although possible) to construct three-dimensional scatter diagrams,

we will he content with the more usual two-dimensional diagrams. In

doing this, let us think in terms of the two two-variable straight

lines discussed earlier. We u plotting the cost (C) against

the number of illustrations (I) on one graph avd the cost (C) against

the number cf printed pages (P) on the other. The first two diagrams

(a and b) in Fig. 38 show the results. As we should have expected,

in neither case do we see a clearly defined relationship. Any rela-

tionship that might exist between cost and the number of illustra-

tions is obviously distorted by the fact that reports with the same

number of illustrations have different numbers of printed pages.

W. A. Spurr and C. P. Bonini, Statisticaz Anazlysis for Business


Decisions, Ricnard D. Irwin, Inc., Homewood, Illinois, 1967, p. 592.
96-

For example, there are three reports each with two illustrations,

but one has four, one has ten, and one twenty printed pages. The

number of illustrations similarly distorts the relationship between

cost and number of printed pages shown in Fig. 38b. Even with all of

the di3tortion pr-sent, it is possible to see a general upward trend

in Fig. 38b. As the number of printed pages increases there is a com-

mensurate increase in the cost. Dur curve-fitting technique will be

to capitalize on this by fitting a straight line to the data plotted

in Fig. 38b and to use the results to improve the relationship between

cost and the number of illustrations. For simplicity we will use the

method of averages to fit the straight line and the point-slope formula

to ,,rite the required equation. The details of these and other re-

quired computations are shown in Table 13. When using the method of

averages, the data are first ordered according to the value of the in-

dependent variable (see Columns a, b, c, d) of Table 13. Because there

are two independent variables involved and because the data cannot be

ordered according to both of them at the same time, two separate set-

ups are required. Those calculations that require ordering according

to number of illustrations are shown on the upper half of Table 13,

and those that require ordering according to number of printed pages

are shown on the lower half of Table 13. Since the sequence requires

stepping back and forth between the upper and the lower half, the steps

are indicated by the numbers shown in circles at the head of each column.

The calculations of the average points for fitting the first

straight line (between cost and number of printed pages) are shown in

the lower half of the table in Column 1. The coordinates of the two
C (a) Cost vs illustrations C (b) Cost vs printed pages
$2.20 . $2.20 $2.

2.00 - * 2.00 - 2.
00

1.80 - 1.80 - 0

1.60 - 1.60 - 1
1.40 - 1.40 C 1.659 + 0.0157 P I

1.20 - 1.20 1

1.00 1.UO
" _____________ I I I 1_! I ! L II, I i p

0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20

C3 (e) C 3 costs vs illustrations C6 (C 4 costs vs printed pages C


$2.20 $2.20 $2. C

2.00 2.00 -2

1.80 1 80 - a.

1.60 F 1.60 -1
3
1.40 C = 1.212 + 0.0763 r 1.40
C4 1. 156 + 0.0342 P
1 1.20 * 0 1

.0
1.0 1.

_ _I _ i I I I I I I I I P
0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20

c7 (i) C7 costs vs illustrations (j) C8 costs vs printed pam


$2.20 $2.20 $2.',

2.00 - 2.00 - 2.
2.
1.80 1.80 1.
1.
1.60 1.60 - 1.
1.
1.40 -1.40 .1

C7 1.o64 + o.0o9261 1.r c* 1. 00 0.0379 P 1. 1


1.20 - 11.
100 1.00 1

L I I L- -_ t ... I I I i-A...L......L..'
I I P
0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20

Fig. 38--Using successive approximations to ac


-97-

C1 (c) Cost I vs illustrations C2 (d) Cost 2 vs printed pages


.20 $2.20

.00- 2.00k

.80- 1.80 -

.60 - 1.60.

.40 * C1 = 1.433 + 0.05451 1.40


. 2
1.301 + 0.0286P
0 "C 0 .
1.20
.20

.00 1.00

I I I I I I I I I P
0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20

5 (9) C5 costs vs illustrations Cr (h) C6 costs vs printud pages


.20 - $2.20

.00 2.00 -

.80 - 1.80

.60 - 1.60

.40 - 1.40
120 -1.20 Z0 - 1.0C:6 1.074 + 0.0369 P

.00 1.00 -
LI ! I I I I I r" , I I i I I P
0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20

0k) C12 costs vs printed pages C13 (M)C13 costs vs illustrations


$2.20
.20

2.00 2.00 -

1.60 1.80 -

1.60 - 1.60

1.40 1.40 ,. L¢13= 1.0114 + 0.01


2 12
= 1.0113 + 0.0397 P 1.20 +0

1.00 1.00
-A1 I A 1 I I
L P I 1.
0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20

ch ieve cont inmous kv imp oved relat i mships


Tabl

USING SUCCESSIVE APPROXIMATIONS AND THE METHOD OF AV[

Expression: C =

Data

Report
()(b) (c)
No. of Illue- No. of Printed
(P )
(dl
Coat per
(D 1 G 0
No. t reiona (I PR.e Report ( (c) i (c) (C) (C )
38
(Figo 38a) (Fig. c) (Fil. 38e) (Fig. 38g)

1 1 18 1.82 1.54 1.77 1.31 1.74 1.20


2 2 4 1.36 1.30 1.25 1.25 1.21 1.22
3 2 10 1.60 1.44 1.49 1.31 1.45 1.26
4 2 20 2.00 1.69 1.89 1.43 1.85 1.32
5 3 15 1.90 1.66 1.74 1.47 1.67 1.39
6 4 13 1.92 1.72 1.10 1.55 1.61 1.48

AV. - 2.33 Av. - 1.56 Av. 1.39 Av. 1.31


7 5 7 1.78 1.67 1.51 1.58 1.40 1.54
8 5 16 2.14 1.89 1.87 1.68 1.76 1.59
9 6 2 1.68 1.65 1.35 1.62 1.22 1.61
10 6 6 1.84 1.75 1.51 1.67 1.38 1.63
11 7 1 1.74 1.72 1.36 1.71 1.21 1.71
12 7 7 1.98 1.87 1.60 1.78 1.45 1.74

Av. - 6.00 Av. - 1.76 Av. - 1.67 Av. - 1.64

A - 3.67 I.- 0,20 A - 0.28 - 0.33

I1'-00545r
3 5 1"212+0.0763r C -7-0.07631 C5-1.100+0.0899

.,.3~0.5 2 4 5

.(Fi.
38b) (Fig. 38d) (Fig. 38f)
Ii 7 1 1.74 1.72 1.36 1.71 1.21 1.71
9 6 2 1.68 1.65 1.35 1.62 1.22 1.61
2 4 1.36 1.30 1.25 1.25 1.21 I22
10 6 6 1.84 1.75 1.51 1.67 1.38 1.63

7 5 7 1.78 1.67 1.51 1.58 1.40 1.S4


12 7 7 1.98 1.87 l.t'O 1.78 1.15 1.74

Av. - 4.50 Av. - 1.73 Av. - 1.43 Av, - 1.31


3 2 10 1.60 1.44 1.49 1.31 1.45 1.26
6 4 13 1.92 1.72 1.70 1.55 1.61 1.48
5 3 15 1.90 1.66 1. ?4 1.47 1.67 1.39
8 5 16 2.14 1.89 1,87 1.68 1.76 1.59
1I 18 i ,tQ 1.54 1 .77 1.31 1.74 I.20
4 2 20 2.0 1.69 1.09 1.43 1.85 1,12

AV. " A,.- 1.90 Av. - 1.74 AV. = 1.60

. - 10.63 - 0.17 - 0.31 A - 0.37


C-1.6,5940.0157r .- 0.01571 .113010.0286f, 0--.280, -1.15640.042f' ,-c-0.03421'

A
-98-

Table 13

OD OF AVERAGES TO FIT A THREE-VARIABLE LINEAR EQUATION: WORKSHEET


on: C = 1.00 + 0.101 + 0.04F

Calculations

0 0 0 7) 0 0 0
L() (C6) I (C 1- (,-,) 0)
( (CO0 L )1 W 12) (13)

3N) (Its. Ni) (Fig. 38a)

1.20 1.73 1.16 1.73 1.14 1.72 1.12 1.72 1.11


1.22 1.16 1.21 1.17 1.20 1.17 1.20 1.16 1.20
1.26 1.42 1.23 1.41 1.22 1.41 1.21 1.40 1.20
1.32 1.82 1.26 1.81 1.24 1.81 1.22 1.80 1.21
1.39 1.63 1.35 1.62 i.33 1.61 1.32 1.61 1.30
1.48 1.56 1.44 1.55 1.43 1.54 1.42 1.53 1.40

- 1.31 AV. * 1.28 AV. * 1.26 AV. 1.?' Av. - 1.24

1.54 1.33 1.52 1.32 1.51 1.30 1.51 1.29 1.50


1.59 1.69 1.55 1.68 1.53 1.66 1.52 1.65 1.50
1.61 1.14 1.61 1.12 1.60 1.11 1.60 1.09 1.60
1.63 1.30 1.62 1.28 1.61 1.27 1.61 1.25 1.60
1.71 1.11 1.70 1.09 1.70 1.07 1.70 1.05 1.70
1.74 1.35 1.72 1.33 1.71 1.31 1.71 1.29 1.70

. 1.64 Av. - 1.62 Av. - 1.61 Av. - 1.61 AV. - 1.60

- 0.33 a - 0.34 a - 0.35 6 - 0.36 .s - 0.36

08" 6C00+0.
.C_0.0899. (7_1o+c.o26r -- 0.09261 .1.03540.095 10-0.09541 ."-1.21 0.0981" . -".0981: .0114.0.091
10 0 -C

(J C ) (5 3)

(Fig . jZh) (fig. 38j) (Ftg. 36k)

1.71 1.11 1.70 1.0 9 )0 1.07 1.70 1.5 1.70


1.61 1.14 1.61 1.12 1.60 1.11 1.60 1.L9 I. 60
1.22 1.16 1.21 1.17 1.20 1.17 1.20 1.16 1.20
1.6) 1.30 1.62 1.26 1.61 1.27 1.61 1.25 1..,40
1.54 1.31 1.52 1.32 I.I 1 50
1.0L 1. I.0
1.74 1.35 1.72 1.33 1.21 1.31 1.71 1.29 1.70

Av. - 1,24 Av. - t.22 AV. - .21 A.. - 1.19

1.26 1.42 1.23 1.41 1.22 1.41 1.21 1.40 .


1.4s 1.56 1.44 1.55 1.43 1. 5o 1.42 1.s 1 .0
1.39 1.63 1.35 1.62 1.33 £. I 1.32 1.61 1.10
1.51 1.69 1.55 1.68 1.53 .. 5z 1le I. W
I.20 1.7) 1.16 1.73 1.14 1.;2 1.12 1.72 I.I1
1.12 142 1.26 1.11 1.24 1.61 I.22 1.60 1.:1

Av. * 1.64 AV,. - 1.6 3v, * 1.63 AV. - .&2

a- 0.40 0.41 . - 0.42 . - 0.41

c----I.0?.0.0!•0.042F
1 -0-0.069P 09 -0,0379F . . 1540.o .Y, 4.* 348 P 1. 0.0391 0
-99-

points (PP,CI) and (P2 1 Q2) are (4.50, 1.73) and (15.33, 1.90) re-

spectively. The modified point-slope formula is

1 - :1*"

To simplify calculat ion with a dt.sk calculator, tLe modified voint-

slope formula above was recast as follows:

1-2 2 1 2 _ :
2 1 + ( 2 -'1

or

= a .

where

12 21

'2- ' 1

In t he irs t c as e, t I.0 Va I LWS J r t' U') St it utVI a116

1. 3) 1 3) (1 1

~tcu~eI nd irvo a,z 1v oa ickilaitcd a- th,, rum-Owr.

are entered n th' table, they ;h,'.ld lt" done at that t ijrn and in-

dicated by entered in the appropriat coun.


'e
- 100-

The equati1, ui the strai,-ht line d-cribing the relationship

between cost and number of printed pages is thus

=1.659 + "~.0i.5

Tile suhscri-t ilis used to indicatte that values of 'c-alculated from

9lis equat ion are estimates rath-er tiian actual-; When this equat ion

is plotted as in Fig. 3Fo, it give3 a rough approximaticn of the true

relat ionsh ip. Ho .;ever, a rough ,approx,,imation is better than nione, as

we shall sub'Sequentlv seO.

At the. momlent, !he value of the constant a is of no interest.

The value I or 0.0157 means that for each printed page we must add

1 .57 cents to the cost . We can reduce the cost of each case by this

figure in proportion to the number of printed pages, and thcni examine

these results witin respec t ~othe -,umber of ilIlustrations.- rhe ad-

ustment is made by setting

For report No. 11 the re,.ult would be

(1.702),

;I ' shown in1 o 1117'n 2 in1hle lowe r halI o f Tabe Iv 3. e xet rm k v th e

samkc e du, t iokn i n c o st f or e ach r epo rt i n p r op ort iokn o he, nbe


kIM r o

printed -a1ge%. ;,len th is has been complet ed . we t ranisf er theit rcl ts

to Column 3, in the upper portion of the table, and simultancou-1y


-101-

reorder them according to the number of illustrations in each case.

We indicate these values by the symbol "I where I (known as a super-

script, not an exponent) signifies the first adjustment to the original

costs.

When we have plotted these adjusted costs against the number of

illustrations as in Fig. 38c, we ha%- a more definite relationship than

that indicated in Fig. 38a. What has happened is this: Although the

equation relating cost to number of printed pages was extremely rough,

it -'as sufficient to eliminate enough of the efect of printed pages

from ' to clear up the relationship between' atid I.

The next step is to follow our logic and determine the relation-

ship between and I using the results to further clean up the re-

lationship between cost and number of printed pages. Once again, we

employ the method of averages, placing the results in Column 3 in the

upper portion of Table 13. This fitted line can be seen plotted in

Fig. 38c. The equation of the fitted line is

.1 =1.433 + 0.05451.

This equation gives us an approximaticn of the impact of the number

of illustrations on cost--in this case 5.45 cents per illustration.

The costs are again adjusted as in Column 4 in the upper portion of

Table 13, this time to eliminate the effect of the number of illus-

trations according to the approximation given above. This adjustment

is made according to the formula

2 - - 0.0545:,
-132-

9
where indicates that the cost has been adjusted for the second time.

The adjusted figures are next transferred to Column 5, lower half

of Table 13, and the results plotted against the number of printed

pages as in Fig. 38d. A co;pariso. of Fig. 38d with Fig. 38b shows

the extent to which our first approximation of the cost of illustra-

tions has improved the relatlinship between total cost and number of

print-d pages. This process of refining the approximations is con-

tinued first with respect to one of the indepeudent variables and then

the other. Each time an approximate relationship is obtained it is

used to further adjust the cost; the adjusted cost is then related to

the other independent variable and the process repeated again. The

calculations in Table -ollow the adjustment process through thirteen

times. The calculations of Columns 3 through 8 and Columns 12 and 13

in Table 13 arL. illustrated by Fig. 38d through Fig. 38m.

The relationship between cost and number of printed pages shown

in Fig. 38k which was arrived at on the 12th adjustment can be de-

scribed by the linear equation

C 12 = 1.0113 + 0.03977'.

This equation is quite close to that portion of the original equation

dealing with printed pages,

= 1.00 + 0.04P.

The relationship between cost and number of illustrations shown

in Fig. 38m is also quite close to the relevant part of the original

equation:
-10 3-

3 1.0144 + 0.09811,

as compared to

= 1.00 + O.iO.

Then the two two-variable equations are comb.ined as

= 1.01 + 0.098: + n.03977,

we have a very close representation of the original equation

= 1.00 + 0.107 + 0.04'.

Had we continued with our process of successive approximation and

adjustment, we could conceivably have reproduced the original equation

exactly. But this would have meant carrying the calculations to more

significant digits which was unnecessary for the purposes of this

example. This method of curve fitting, quite appropriately called

the Method of Successive Approximations, can be used quite generally--

even in cases where the separate relationships can only be described

by freehand non-mathematically describable curves.

Fortunately the method of least squares accomplishes similar

results for the three-variable linear relationship by means of a di-

rect and absolute rather than an approximate solution. To show that

both methods result in the same solution, the method of least squares

is next applied to the same problem. Data are calculated in Table 14,

and the accompanying graphs plotted in Fig. 39. Normal equations for

this solution are as follows:


I
-104-

T=a + 7p
2+ LT

S,' a +b 1 +2 b P2.

This wrorks out to

I ~~
~ 21.76 12a + 50b, 4 ~
119b 2Lhswrsott

91.88 50a + 258b + 402b,

224.36 119a + 402b I+ 1629b2

Therefore the sclution is

C = 1.000 + M OT+ 0.04P.

C P = 24 22 20 18 16 14 12 10 8 6
$2.20

2.00 P 2

1.80

1.60

1.40

1.2 100 + 0.I + 0.04P

1.00 -

0 1 2 3 4 5 6 7 8 9 10

Fig. 39--Least-squares solution to the three-variable problem


showing one way to graph a three-variable equation
-10 5-

-.- 0 0 LtD 00 - N t 00 Z co z G

.. jI

-. ~~ ~0, ~
Cc- N ~ L
-o- ci' 0.0
'4

CN --T N. 'n M .~~ C

00

z~0 K~ 0 CN 'C) - . O~c


N

o C- . - N'

E-4I -

H ,.-fn

N0 " '~ 0 ' 0 ' C

CO '' C CO 'C ON0 rD 0'0


p-.- -~
CL z ~- L~N
i C i - 0
-
-106-

THE NONLINEAR CASE

It is not unusual to encounter sets of three or more variables

that cannot be adequately described using linear relationships, and

that require nonlinear curve fitting. In this section of the Memo-

randum we will use the method of least squares to fit the straight

line, the exponential, the power function, and the parabola to a set

of one dependent and two independent variables.

Fitting a three-variable linear equation and using the method of

least squares has already been described. We remember that the linear

equation

X1 a + b3X 2 + b3X
3

resulted from the two two-variable equations

X = a + b2X 2 9

and

X1 a + b3X 9
3

with each describing the relationship between the dependent variable

X 1 and either X 2 or X In each case, the influence of tho other wa3

not accounted for. In the combined relationship, b 2 and b 3 were

written b12.3 and b13.2 to show that in the first case the effect of

X 3 was eliminated, and that in the second case the effect of X 2 was

eliminated. The method of successive approximations was used to

demonstrate how this could bL done. Further, it was shown that the

method of least squares produces the same answer with considerably

less eff,'rt.
-107-

We will now build on these fundamentals to illustrate three-

variable nonlinear curve fitting.

As there is nothing about the dtailed calculations required here

that is different from those previously illustrated, we will 7.-t de-

scribe them again. Instead, we will concentrate on shoving how vari-

ous nonlinear functional forms can be used. 'n particular, we will

point up their peculiarities and consequently the'r limitations.

Twenty sets of the three variables--X I, X2 s and X3--are shown in

Table 15. X is the dependent variable; X2 and X3 are the independent

variables. We will proceed -o fit a linear, an exponential, a power

function, and a parabolic relationship to these variables. Good prac-

tice dictates that we start by examining the data more closely. As

with the two-variable case, preparing a scatter diagram is always a

good beginning.

Figure 40 shows the results of plotting X1 against X2 while

ignoring X3 ' Little more than a general qcattering of points is ob-

served. But when each point is identified with its X3 value and con-

tour lines connecting all points with equal values of X3 are drawn,

as in Fig. 41, a relationship can be seen. For each value of X3 X1

increases with increases in X 2.

Figure 42 shows similar results. Here XI is plotted against X 3

and contours connecting points having equal values of X2 have been

drawn. For fixed values of X 2, X1 increases with increases in X3 .

At this point, we also note a distinct curvature in one or two of the

contours which suggests a nonlinear relationship between X and. XY

A point from which to compare the results of fitting nonlinear

relationships has been provided by fittipg a linear relationship to


-108-

Tab l 15

RESULTS OF FITTING A THREE-VARIABLE LINEAR RELATIONSHIP


(X1 = -20.01 + 0.4998X 2 + 1 295X3 )
X~l'' -xPercent

Observation X X y X (ca lc.) X Deviation

1 7.31 5 5 -11.08 18.39 251.5


2 37.67 5 49 45.92 -8.25 -22.0
3 67.37 5 71 74.42 -7.05 -10.5
4 121.31 5 100 111.99 9.38 -7.7
5 20.93 16 27 22.92 -1.99 -9.5
6 24.77 27 27 28.42 -3.65 -14.7
7 33.57 27 38 42.67 -9.10 -27.1
8 22.78 38 16 19.66 3.12 13.7
9 29.16 38 27 33.91 -4.75 -16.3
10 118.26 38 93 119.41 -1.15 -1.0
11 39.62 60 27 44.91 -5.29 -13.4
12 45.68 71 27 50o41 -4.73 -10.4
13 149.34 71 100 144.97 4.37 2.9
14 41.97 82 5 27.40 14.57 34.7
15 59.48 93 27 61.40 -1.92 -3.2
16 148.58 93 93 146.90 1.68 1.1
17 163.14 93 100 155.96 7.18 4.4
18 73.14 100 38 79.15 -6.01 -8.2
19 114.06 100 71 1P1.90 -7.84 -6.9
20 153.44 100 93 150.40 3.04 2.0

23.0 av

the data. The least-squares normal equations and the resulting linear

relationship follow:

+
L I Na + b2 X2 b

'Yx a X 2 + b2 X+ h x x

x1x a X +b x +b X2
XX a L 3 2L 2 3 3 '3

X 1 ,,-20.01 + 0.4998X, + 1.295X 3 .


-109-

17

170

150

160

130

140

130

100

900

180

10

90

80

400

30

20

10
00

40 10
310 2 040 5 60 7 80 9 10

Fg 0 -- c t e i g - X v ,
-110-

170 -100
0
160 - de 9J
100 93
150 - d

140 -

130 - .--
100 .00 0

120 - 93 * -71

110 -

100 -

90 - ----

7171. -- 01

60 e

50 Op 27
2-

40 -490.-10 - ~0
.0 38 001--2 -o - .

7 27- - -
7
20 7

20 e

10- 1. .

01 1 1 1 1 1 1 . I I J ,
0 10 20 30 40 50 60 70 80 90 100

tig. 41--Scatter diagram: X vs X2viti, contours


showing equal values ofX3
-111

170 93
0

160 100/
4/ 71
150II

140 / /

130
1 /I/I I.5 38•
120 1 38.

110 , 11

100 /, / /
,/ /

90 I,/
/
/
/ /
/
I' / /
80
I0
#
I
/ / i
/

1/ ,/ /

70 // / // /0

60 - 93
/ / /
/ / /

/ /
82! e / /

40 609 7
27 -b 5

30 18 - -
20e
Joo t
0 16

;- 0 i ti I I I I I I
0 10 20 30 40 50 6(0 70 80 90 100 x

Fig. 42--Scatter diagram: X 1 %;X


vI with contours
showing equal values tit X
l '!2-

In Fig. 4 , is plctted against .. , ignoring V The straight

lines result from solving the linear equation above, allowing ; to

vary over its relevant range while holding ; constant at the values

indicated in Fig. 43. The deviations of the points from th:e appro-

priate lines are indicated by the vertical connecting ines. As was

to be expected, the linear relationship does not describe the data

very well. A tabular presentation of the results was shown i: Table 15.

Given the indications of nonlinearity in Figs. 41 and 42 and the

poorness of f;t achieved with the linear form, a nonlinear form seems

in order. When confronted with a similar situation, analysts often

turn immediately t te power function on the grounds that it will

straighten anything out. 'We will try this and see what happens.

rhe basic power function in two variables is

= 2

or

in logar i thmic orm thesk, v,,at ions bec-me

an'l

loI low + lIog

tie t rans it ion I r,7 the t%.,-variabl, ve quat son


,o the nt threv-

var iab le quat ion is anai ogoIs to the equa t iotts .resentvi, in the be-

ginning of Section Ill.

5
eC pp. 'Jj4 .
-113-

17o

X 0

3 7
1250

1440

10 ZK
I71 i- '
r e20tin
-114-

Either

log = z log X2 + b3 log X

or

is the required equation and, as can be seen, the equation is linear

in terms of the logarithms of the variables. The least-squares normal

equations used before are appropriate here, given that the logarithms

of the variables are substituted for the variables. For example.

log X1 ="a + b 2 log X 2 +3 log X3 ,

log X1 log A2 = a log X 2 + b (log X2 )' + b 3 log X 2 log X 3 ,

-Log V'Ilog X 3 = Z V log KY


3 + b2 log X 2 log X3 + Th3 (log A3 2

When the requir-.d values are calculated and this set of equations

solved, the following power function results:

=
log X I 0.16555 + 0.26963 log X 2 + 0.73198 log X3 ,

or

X0 1.464026963 0.73198
, 1 .4 4 2 3
'

How well this equation does the job is shown in Fig. 44 and Table

16. It is obviously no better thanL the linear relationship and possi-

bly even a little worse. The most striking shortcoming is that the

direction of curvature is wrong. Figures 41 and 42 indicate tiat the

required curve should be concave upwards, and these curves are concave
-115-

Table 16

RFSUI,TS OF FITTING A I'IREE-.ARI ABIE POWER FUNCTION RELATIoNSHIP


(log Z 1 = 016555 + 0.26963 log X 2 + 0.73198 log X 3 )

Observiontionen
A, Y' X (c)alc.
X (cl Percent
observation
1 2 -3 1Xi ja1 Deviation
1 7.31 5 5 7.34 -0.03 -0.4
2 37.67 5 49 39.01 -1.34 -3.6
3 67.37 5 71 51.18 16.19 24.0
4 121.31 5 00 65.76 55.55 45.8
5 20.93 16 27 34.51 -13.58 -64.9
6 24.77 27 27 39.74 -i4.97 -60.4
7 33.57 27 38 51.03 -17.4 -52.0
8 22.78 38 16 29.71 -6.93 30.4
9 29.16 38 27 43.58 -14.42 -L9.5
10 11.8.26 38 93 107,75 10.51 8.9

11 39.62 60 27 49.29 -9.67 -24.4


12 45.68 71 27 51.57 -5.89 -12.9
13 149.34 71 100 134.48 14.86 10.0
14 41.97 82 5 15.60 26.37 62.8
15 59.48 93 27 55.47 4.01 6.7

16 148.58 93 93 137.15 11.43 7.7


17 163.14 93 100 144.64 18.51 11.3
18 73.14 100 38 72.64 0.50 0.7
19 114.06 100 71 114.79 -0.73 -0.6
20 153.44 100 93 139.86 13.58 8.9

24.3 av

downwards. Did we make a mistake in arithmetic? No, there was no

mistake, except in the selection of the power function in Lhe first

place. Figure 18 (the general shape of the power function for values

of x greater than or equal to 0) could have told us that we would get

what we did. This is another illustration of the value of the scatter

diagram and a knowledge of the basic properties of the functional

forms with which we are dealing. Consider a situation similar to this

one except that tie fit Is better. In such a case we might well have

used this relationship for extripolating beyond the upper range of

the sample.
170

160

150 X 3 = 100

140 - X3 - 93

130 -

120 -
X3 = 71
110

100

90 X3 = 49

80
X 3 = 38
70

60x 3 = 27

50

40 /X 3 = 16
303

20 -
5-

10

!0 - 2
0 10 20 30 40 50 60 70 80 90 100

Fig. 44--Results of fitting a three-variable power function


relationship (log X, - 0.16555 + 0.26963 log X2 +
0.73198 log X3 )
-117-

It is true, however, that the exponential has the general prop-

erty we desire; refer back to Fig. 14. The two variable exponentials

would be
Y
"2

1 = 2

and
x
X1 = a 3

or, in more useful form

X+b2X
2

and
a+b X.
X =e

For further clarification on this point refer to the earlier section

on the properties of the exponential.

When the natural logarithms of each side of each equation are

taken, we have

in a + b2X
222

and

in X a + b3X 3 ,

which combines into the following three-variable equation as before:

In X I . a + b2X2 + b3X3P

which is linear when the logarithm of X is used in place of X


} -118-

The least-squares normal. equations are as for the lineai curve

with In X substituted for X 1

in X1 = Na + b2 Y + b3 X3

X In X . a X + b X 2 +b XX
L2 1 2 2 L 2 +b 3 L 2 3'

ZX 3 In X 1 = a [X 3 +b 2 [X 2 X 3 +b 3 L X .

The resulting equation is 2 r

lI' X = 2.509 + 0.0092732 X 2 + 0.019415 X 3

or

2.509 + 0.0092732 X 2 + 0.019415 X3


X1 e

Just how well t'Li,,b


equation fits the data is shown in Table 17

and Fig. 45. We note from observing the scatter diagrams and the

average percent deviations that the exponential relationship comes

closer to fitting t!,e data than does either the linear or the power

function. Te direction of curvature is as we predicted. However,

while things are progressing, the exponential leaves much variation

to be explained.

Another curve which, in general, has the desired properties (at

least in part) is the parabola of the form

2
y - a + bx + cx

The earlier section on the parabola pL-vided a complete discus-

sion of this equation. This equation is in two variables, however,

and for our purposes we need one in three. Fortunately, we may proceed
-11 9-

Table 17

RESULTS OF FITTING A THREE-rARIABLE EXPONENTIAL RELATIONSHIP


(In X 2.509 + 0.0092732X 2 + 0.019415X 3 )

Observation 1 2 3 X(c -(a ) Percent


OserX1 '2 3 I 1 1, c Deviation

1 7.31 5 5 14.19 -6.88 -94.1


2 37.67 5 49 33.34 4,33 11.5
3 67.37 5 71 51.10 16.27 '24.2
4 121.31 5 100 89.74 31.57 26.0
5 20.93 16 27 24.08 -3.15 -15.1

6 24.77 27 27 26.67 -1.90 -7.7


7 33.57 27 38 33.02 0.55 1.6
8 22.78 38 16 23.86 -1.08 -4.7
9 29.16 38 27 29.54 --
J.38 -1.3
10 118.26 38 93 106.38 11.88 10.1

11 39.62 60 27 36.22 3.40 8.6


12 45.68 71 27 I 40.11 5.57 12.?
13 149.34, 71 100 165,49 -16.15 -10.8
14 41.97 82 5 28.98 12.99 31.0
15 59.48 93 27 49.19 10.29 17.3
16 148.58 93 93 177.15 -28.57 -19.2
17 163.14 93 100 202.94 -39.80 -24.4
18 -3.14 100 38 61.98 8.16 11,2
19 114.06 100 71 123.32 -9.26 -8.1
20 153,44 100 93 189.03
1 -35.59 -23.2

18.1 av

as before. The two variable equations are:

I = + 2' V + Il,).

and

1 " + 1 33
' 3'3 '

which combined, form

X1 + b2 2 '22 +b3' 3 + 3 3,
X3= 1W')

X 93
3

150

140

160

120

110

100

90

80 x3 = 49

70
X 3-38
60

503

40

10

0 10 20 30 40 50 60 70 80 90 100 X2

Fig. 45--Results of fitting a three-vrariable exponential relationship


(in X, 2.509 + 0.0092732X z + 0.01V,415X3
-121-

Notice Lhat instead of two independent variaion, X 2 and X we now


2
2 2 2
have four variables: X2% X2 , X 3. Fortunately X2 and may

be calculated given X2 and X so that we have a special case of fitting


2 3'
whac is essentially a five-variable linear relationship. The least-

squares normal equations follow:

X1 = Na + b X + 2 + X + '
.. .. 2 3+ 3 3 3
V A V "2 + 2 X2 + 2 + b A +

+ +
1IX ,2 +2 2 +c 2 2 3 X23
2 3 L a23
2 2 .3

XX 3 = a 1 2X32 + b22 7x 22 X 3 + 2C L
2 XX
2 3 + b 37
32 3X + 3 _ v X3
2'3 2 3 3 + 3 X3'

Tx2 ~ 2 2 22 3 _
X1 3 3 L 2 3 2 2 X3 + b 3 , X3 3 3'

Manual solution of this set of equations is lengthy at best. Con-

sequently, one of tbf many computer programs available should probably

be useO. With a computer, the task becomes a 6iLple one, and the

chance of mak'ing errors in arithmetic is minimum. In the case of our

exampi the derived -auation is

2
I 5.006 + 0.2498 X + 0.002301 ' + 0.1499 X + 0.01000 X2
2 2 '3

Table 18 and Fig. 46 indicate that we have iideed found the cor-

rect empirical equation. However, even with fits as good as this one,

unless there is a logical base for the particular equation, extrapola-

tions beyond the range of the data should be made with extreme 'a.,tion.

Such is particularly true when the relationship is a pa.abola. (X'-

view the section of this Memorandum on the general properties of

parabolas.)

PI
-122-

Table 18

RESULTS OF FITTING A THREE-VARIABLE PARABOLIC RELATIONSHIP


2
(X= 5.006 + 0.2498X + 0.002301X + 0.1499, 3 + 0010OX 3 )
'1 2 2 '3 3'

A1 ' X (Calc.) X -a c. Percent


Observation
3 71 Deviation

!1 7.31 5 7.31 . ..
2 37.67 5 49 37.67 -
3 67.37 5 71 6.37
4 121.31 5 100 121.31 ......
5 20.93 16 27 20.93 ......
6 24.77 27 27 24.77
7 33. 57 27 38 33.57
8 22.78 38 16 22.78
9 1 29.16 38 27 29.16.
10 118.26 38 93 118.26 .....
11 39.62 60 27 39.62 ---
12 45.68 71 27 45.68
13 149.34 71 100 149.34 ......
14 41.97 82 5 1.97 ......
15 59.48 93 27 59.48 ......
16 148.58 93 93 148.58
17 163.14 93 1001 163.14 ......
18 73.14 100 38 73.14 ......
19 114.06 100 71| 114.06 ......
20 153.44 100 93 153.44

We ii"vv "ii1Wz, o bf l o,n similar two-var-.ab.e

relationships to form a single three-variable relationship. In fact,

certain dissimilar two-variable relationships may also be combined.

For exampie,

X'1 + ?

may be combined with

33 3 3

to form
16

X3 = 9
150

130

1260

100

130

38
110

10

90

10

0 10 20 10 40 50 60 70 80 90 100

Fig. 4b--Results of fitting a three-variabiq parabolic reljtlonsh~p


' - 5.006 + 0.2498X2 + 0 .002301X- + 0,1449K + 0. )100X-)
3
=a+ Y +~' +

1Ifit were observed that AV varied linearly with A and nonlinearly

with X )an equation like the one above might then be an appropriate

choice.
-125-

.kppendix A

DERIVA\TION OF THE NORMAL EQUA'-TONS FOR A


LEAST1-SQUARES FIT OF A STRAIGHT LINE, A PARABOLA,
AND A THREE-VARIABLE LINEAR "tVOUATTON

A Straight Line

in curve fitting the general equation of a straight line is

.-= + .-'

where L and ; are the parameters to be determined such that the sum

of the squares of the deviations from the resulting line is a mini-

mum. The carets are placed over those values that are to be estimates.

If we let each value of the deDendeit . variable be represented by

with the subscript assigned according to the data point we are using,

we can write the general formula for the deviations as

1^

On substituting the expression for y we have

(A + SX.)

The squared deviations are

2. = Lyi - (a + 8X')]

which on expansion becomes

22 = y 22 - 2&y. - 26x.. + a
^2
+ 2ix. + 2
x.
1 71b
Ut

We need such an expression because our interest is in minimizing


t -126-

the s,:m of the squared deviations. The expression that follows repre-

sents symbolically the summation of the above expression across all

values of { from I to n, where i would represent the first data point,

2 the second, and Yz the last:

Q v!.
, . . . ~= . -5 x + + 2 ,:3 x.
X' + x.
i:: i -l >'1 ':41:=

From calculus we know that, for this expression to be a minimum, the

partial derivatives of Q taken with respect to & and B must be equal

to 0. It can also be shown that this is . suj"';icient condition for

the above expression to be a minimum. Thus the prncedure is to ob-

tain these two partial derivatives and to equate them to 0. The par-

tial derivatives are

2
= -2 c + na + 2B x

3 Ed n nn x
_ ==-2 X + 2a x + 2S \ .

=I i=1 i=I

After equating each of these to 0 and simplifying, we have

n2 n

j=l yi fa + i=1X

n n 2

which are the required normal equations. All of the information in-

dicated both by the sumnation signs and by n can be determined directly

The condition of sufficiency applies to any function that is


linear with respect to all of its parameters, such as the parabola.
-127-

from the data; this will result in two equations in two unknowns (

and ;) which can be solved for simultaneously.

A Parahola

Tne general equation of a parabola is

2
+ Ix +

where , 3, and ' are the parameters to be determined.

An individual squared deviation may be represented as follows:

2 - (; + i .+ Y 2.),--
2.

which when exppnded is

2 2 - 2 -2 ..
.= ,.- - " + - + 2 x. + 21x,.
, 2-,7. 2" 2. 2' 2 2.

+ 2 2 4
+ x . + 2x., + ' x..

The sum of the squared devriAHnns taken from 1 to n is

-
2 <1 )j/j?: - 2J ," 2 . ' : 2{ ax.2-.?4 n- 2 + 2A X.

2. 2 2 3 2 4
+
2
xy.+ y x "+" ,l..X+2

To minimize, we take the partial derivatives with respect to a, and

and eq.,3te them to 0 as follows:

Jd2 PI N N2

7 . 2 iil,
..... ij.! + 2
C1+ 2. F
?,z x. + 2 )
=l ,
-128-

2
n n n n
- -2 Y x.n. + 2Y x. + 2 x2 + ' i

3 i1 i=1 i=1 i=1

i=Y i==

22 3 2 4

n 2 n n n

yx = -2; XY + + x. LX
+ Vxi.
.
Yi + i + =

These are the normal equations for fitting a parabola using the least-

squares criterion. The sums and sums of products are calculated di-

rectly from the data and substituted into t-he normal equeaiOin leaving

three equations and three unknowns. These three equations are solved

simultaneously for &, B and , As for the straight line, the solutions

are unique and exact for all x and y.

A Three-variable lilnear Equation

The general form of the linear three-variable equation is


= (;+ BlX 1 + 2

The derivation procedure is identical to that used in deriving the

normal equations for the straight line and for the parabola. The

squared deviations are

d2 + [- + 'Yx 1 + 2
-129-

Tho above expression is expanded and summed across all the data points,

and the partial derivatives of the summation equation with respect to

r, i' and 6 are taken and equated to 0. The resulting normal equa-

tions are

Y = 6 + 1 '2 X2 2

21Xly = & 1X1 + I~ X,


x 2 21rX 2 '
22

2xY = 2X?+ 1 21 ~2 + 2 Y1~

In the above equation the subscripts are used to distinguish between

the two independent variables and their coefficients rather than to

indicate the range of summation as before. Although it is not spe-

cifically indicated here, it should be understood that the sums are to

be taken across all data points.


=I

-130-

Appendix B

DERIVATION OF THE FORMULA FOR CALCULATING THE CONSTANi .

The value of , as is shown in Fig. B-1, must be such that when

I the values of the - coordinates of points on , are reduced by that

mount, Llie n p..


pci als cd-nthc 11 -c Further, L, must be

linear in terms of logarithms.

4 For L to be linear in terms of logarithms, the triangles AnC


2'
and BDE must be similar. In other words, L 2 must have a constant

slope. It is this fact that provides the basis for calculating

The slope of the triangle ABC is equal t)

CA
BC'

and the slope of the triangle BDE is equal to

EB
DE'

Also the two slopes must be equal to each other, e.g.,

CA EB
BC =DE'(1

When the coordinates of the appropriate points are used to calculate

the lengths of the above line segments and the results are substituted

in Eq. 1, we have

log(y 1 - ) - log(y 3 - ) log(y 3 - x) - log(2 - x)


log x I - log x 3 log x 3 - log x 2

Since we are free to selecL the three points (xI, y1), (x2, Y2 and

(Xv ',3) in any way we wish, we do so in such a way that the denominators
-131-

-J31

g 2

I I x

Fig. B-i--Determining the value of the co,.itanta


-132-

of the two fractions in Eq. 2 are equal, such as

log x 1 - log x 3 = log x 3 - log x 2. (3)

General practice is to choose x and x 2 at the extremities of L and

to let Eq. 3 determine the value of x3 , such as

2 log x 3 log x 1 + log x2 ,

or

log x I + log x 2
logx 3 2 (4)

As can be seen, log x 3 is the average of, or half way between, log x]

and log x2 .

Equation 4 in arithmetic form is

x3 XLX2
1

x3 is seen to be the geometric mean of x and x 2 .

If X3 is chosen in this wav, Eq. 2 then reduces to

log(.:l - L) - log(' 3 - ) log( 3 - .) - log(y 2 -

In arithmetic form we have

?3 '= - ) ;
- ')(7
(91 .,

S2 2
2 :2 - 2 C, + 12
:14 2 - '12 - I +

2
-133-

2
a = y-YlY2 -2-- Y3 3. (5)
Yj+ y-) -2

Equation 5 is the desired result.

If we had been concerned with the semi-log case, Eq. 2 would

have been

log( -) - logy 3 - log(y 3 - a) - log(y 2 - u)

x . x- ,

and Eq. 4 would be

xI + x2
3 2

We would therefore make x3 the average of x I and x 2 instead of the

geometric mean. Equation 5 applies as before.


-134-

BIBLIOGRAF

A Selected Li1st of Recommended Readings

Boren, H. E. , Curves: .4Fite-function Curve-fittinzg Cor~uter Program,


The RAND Corporation, R.M-5762-PR, November, 1968.

Croxton, F. E. and D. J. Co~wden, 9i: 1'*u "' 3rd ed.,


Prentice-Hall, Inc., New York, 1960.

Davis , D. S. , Nomography anti Empirical Equati'ons, Reinhold , New YOrk, 11455.

izekiel, M. and K. A. pox, Methzods cf Correlation and Regressifon Avzalysis,


Wiley and Sons, New York, 1959.

Fowler, F. P. and E. W. Sandberg, Bao:'c Mat. :mafi-s focr Atsrto


ev and Sons, New York, 1962.
I,

Graver, C. A. and H. E. Boren, Mult, ,ar,_e Logar-"7 a:2Ero':nt


-egr134ic'2 R.'-4879-PR, July '4(1.
Mocies, The RAND Corporation,

Levens, A. S., Noog'ap;i: ,Filey and Sons, New York, 1959.

Middlemiss, R. R., ANaew Yo r , MGraw-ill, New York, 1455.

Spur A. and C. P. ionini 959.


, 7.e

Richard D. Irw in. In., Hoewo d linois,


S, 1Q6,

.Nldli.J'lss
ec~t,),Mc(rawHiil
R.R., 'a) Ne Yok, q55
DOCUMENT CONTROL DATA________ __________

I ORIGINATING ACTIVITY 12, RE"t"' SECURITY CLASSIFICATION


U
INCLASSIFIED
THE RAND CORPORATION J2b. GROUP

3 REPORT TITLE
SOM4E CURVE-FITTTNG FUNDAMENTALS

4. AUTHOR(S) (Lost name, first nor", initial)

Petruschell, R. L.

5. REPORT DATE
December 1968 1147
6a. TOTAL No. OF PAGES
I-
ftbNo. OF REFS.

7. CONTRACT OR GRANT No.


DAHC15 67 C 0150
9o. AVAIL A131LITY /LIMITATION NOTICES
1
8. ORIGINATOR'S
RM-5766-SA
REPORT No.

9b. SPONSORING AGENCY


Office of the Assistant Secretary
DDC-1 of Defense (Systems Anqlysis)

tO. ABSTRACT 11.KEY WORDS

A description of the curve-fitting process Cost analysis


for the cost analyst. The study is char- Cost estimating relationships
acterized by intuitive discussions~ with Curve fitting
illustrations of computational procedures, Statistical methods and processes
and treats the more complex relationships
of cost analysis by an appriach that inte-
grates analytic geometry with curve-f.Lt-
ting methods. In order to develop an
equation to describe a particular rela-
tionship, the approach combines the prop-
erties of specific functional forms--the
straight line, the exponential, the power
function, and the parabola--with the val-
ues of equation constants. Examples of
curves fit to two-variable and multi-var-
I3ble rel~ttionships are shown. Both lin-
ear and nonlinear caesa are included.

You might also like