Statistical Analysis of Experimental Data - LN
Statistical Analysis of Experimental Data - LN
Some form of analysis must be performed on all experimental data. The analysis may
be a simple verbal appraisal of the test results, or it may take the form of a complex
theoretical analysis of the errors involved in the experiment and matching of the data
with fundamental physical principles.
The experimentalist should always know the validity of data. The automobile test
engineer must know the accuracy of the speedometer and gas gage in order to express
the fuel-economy performance with confidence. A nuclear engineer must know the
accuracy and precision of many instruments just to make some simple radioactivity
measurements with confidence.
Errors will creep into all experiments regardless of the care exerted. Some of these
errors are of a random nature, and some will be due to gross blunders on the part of the
experimenter. It is not true strategy when an unexpected results obtained while
experimenting.
In this section we present a discussion of some of the types of errors that may be
present in experimental data and begin to indicate the way these data may be handled.
Types of Errors : At this point, we will see the types of errors that cause uncertainty in
experimental results. Generally, experimental errors can be classified into three groups.
1
1. The first group includes errors that arise from mistakes made during the design or
construction of the experimental setup. These errors can be due to carelessness or
inexperience. Errors resulting from the incorrect selection of measurement
instruments or the incorrect design of measurement systems also fall into this
group. Careful experimental design and execution can help reduce these types of
errors. For example, if a measurement device shows a decrease in temperature
when it is supposed to be measuring the temperature of an object, then it is
definitely wrong, and the measurement should be discarded.
3. The third type of error is random error that results from changes in the
experimental team, a decrease in attention over time, electronic measurement
device oscillations resulting from device warming, or fluctuations in electrical
current. Some of the readings may be inaccurate due to random errors that occur
due to vibrations or other factors in the environment.
After a certain number of experiments, several methods have been developed for
determining the error rates associated with the experiment. One of the most commonly
used methods is the "commonsense approach" or logical error, and the other is
"uncertainty analysis." The most preferred method is uncertainty analysis.
2
2.3. Error Analysis on a Commonsense Basis
𝑃=𝐸𝐼
where E and I are measured as
𝐸 = 100 𝑉 ± 2 𝑉
𝐼 = 10 𝐴 ± 0.2 𝐴
The nominal value of the power is 100×10 = 1000W. By taking the worst possible
variations in voltage and current, we could calculate
1040.4 − 1000
× 100 = 4.04%
1000
960.4 − 1000
× 100 = −3.96%
1000
Here, we have found the error percentage taking into account the worst-case scenarios.
However, there is no guarantee that the voltmeter will give the highest value while the
amperemeter will give the highest value as well. Therefore, the probability of the
voltmeter and the amperemeter giving the highest or lowest values simultaneously is
low. Thus, more analysis is needed to determine the error more accurately. 3
2.4. Uncertainty Analysis and Propagation of Uncertainty
When the plus or minus notation is used to designate the uncertainty, the person
making this designation is stating the degree of accuracy with which he or she believes
the measurement has been made.
The experimenter is willing to bet with 20 to 1 odds that the pressure measurement is
within ±1 kPa.
(2.1)
4
Let 𝑤𝑅 be the uncertainty in the result and 𝑤1 , 𝑤2 , . . . , 𝑤𝑛 be the uncertainties in the
independent variables.
(2.2)
𝑃=𝐸𝐼
where E and I are measured as
𝐸 = 100 𝑉 ± 2 𝑉
𝐼 = 10 𝐴 ± 0.2 𝐴
Determine the error in P.
5
Uncertainties for Product Functions
In many cases the result function of the error function takes the form of a product of
the respective primary variables raised to exponents and expressed as
𝑎 𝑎 𝑎
𝑅 = (𝑥1 1 , 𝑥2 2 , … . , 𝑥𝑛 𝑛 )
𝜕𝑅 𝑎 𝑎 𝑎 −1 𝑎
= 𝑥1 1 𝑥2 2 (𝑎𝑖 𝑥𝑖 𝑖 ) ⋯ 𝑥𝑛 𝑛
𝜕𝑥𝑖
Dividing by R
1 𝜕𝑅 𝑎𝑖
=
𝑅 𝜕𝑥𝑖 𝑥𝑖
Inserting the error function this equation
(2.3)
𝑅 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + ⋯ + 𝑎𝑛 𝑥𝑛 = 𝑎𝑖 𝑥𝑖
𝜕𝑅
= 𝑎𝑖
𝜕𝑥𝑖
Inserting the error function this equation
(2.4)
(2.5)
6
Example 2.1 : Uncertainty Of Resistance Of A Copper Wire
The resistance of a certain size of copper wire is given as
𝑅 = 𝑅0 1 + 𝛼 𝑇 − 20
where 𝑅0 = 6 Ω ± 0.3 percent is the resistance at 20℃ , 𝛼 = 0.004℃−1 ±
1 percent is the temperature coefficient of resistance, and the temperature of the wire is
𝑇 = 30 ± 1℃. Calculate the resistance of the wire and its uncertainty.
Solution :
7
Example 2.2 : Uncertainty In Power Measurement
The two resistors R and 𝑅𝑠 are connected in. The
voltage drops across each resistor are measured as
𝐸 = 10 𝑉 ± 0.1𝑉 1%
𝐸𝑠 = 1.2 𝑉 ± 0.005 𝑉 (0.467%)
1
𝑅𝑠 = 0.0066 Ω ± %
4
From these measurements determine the power dissipated in resistor R and its
uncertainty.
Solution :
8
Example 2.3 : Selection Of Measurement Method
A resistor has a nominal stated value of 10 Ω ± 1 percent. A
voltage is impressed on the resistor, and the power
dissipation is to be calculated in two different ways:
(1) from 𝑃 = 𝐸 2 /𝑅 and (2) from P = EI. In (1) only a voltage measurement will be
made, while both current and voltage will be measured in (2). Calculate the
uncertainty in the power determination in each case when the measured values of E
and I are
𝐸 = 100 𝑉 ± 1% for both cases
𝐼 = 10 𝐴 ± 1%
Solution :
9
Example 2.4 : Instrument Selection
The power measurement in previous example is to be
conducted by measuring voltage and current across the
resistor with he circuit shown figure.
The voltmeter has an internal resistance 𝑅𝑚 , and the value of R is known only
approximately. Calculate the nominal value of the power dissipated in R and the
uncertainty for the following conditions:
𝑅 = 100Ω (not known exactly)
𝑅𝑚 = 1000Ω ± 5%
𝐼 = 5𝐴 ± 1%
𝐸 = 500𝑉 ± 1%
Solution :
10
In terms of known quantities the power has the functional form 𝑃 = 𝑓(𝐸, 𝐼, 𝑅𝑚 ), and
so we form the derivatives
11
Example 2.5 : Ways To Reduce Uncertainties
A certain obstruction-type flowmeter, shown in the
figure, is used to measure the flow of air at low
velocities. The relation describing the flow rate is
1/2
2𝑔𝑐 𝑝1
𝑚ሶ = 𝐶𝐴 (𝑝1 − 𝑝2 )
𝑅𝑇1
C = empirical-discharge coefficient
A = flow area
𝑝1 and 𝑝2 = upstream and downstream pressures, respectively
𝑇1 = upstream temperature
R = gas constant for air
Calculate the percent uncertainty in the mass flow rate for the following conditions
𝐶 = 0.92 ± 0.005 (from the calibration data)
𝑝1 = 25 𝑝𝑠𝑖𝑎 ± 0.5 𝑝𝑠𝑖𝑎
𝑇1 = 70℉ ± 2℉ 𝑇1 = 530°𝑅
∆𝑝 = 𝑝1 − 𝑝2 = 1.4 𝑝𝑠𝑖𝑎 ± 0.005 𝑝𝑠𝑖𝑎 (measured directly)
𝐴 = 1.0 𝑖𝑛2 ± 0.001 𝑖𝑛2
Solution :
12
The main contribution to uncertainty is the 𝑝1 measurement with its basic uncertainty
of 2 percent. Thus, to improve the overall situation the accuracy of this measurement
should be attacked first.
13
2.5. Evaluation of Uncertainties for Complicated Data Reduction
We have seen in the preceding discussion and examples how uncertainty analysis can
be a useful tool to examine experimental data. In many cases data reduction is a rather
complicated affair and is often performed with a computer routine written specifically
for the task.
𝑅 𝑥1 = 𝑅 𝑥1 , 𝑥2 , … , 𝑥𝑛
𝑅 𝑥1 + ∆𝑥1 = 𝑅 𝑥1 + ∆𝑥1 , 𝑥2 , … , 𝑥𝑛
𝑅 𝑥2 = 𝑅 𝑥1 , 𝑥2 , … , 𝑥𝑛
𝑅 𝑥2 + ∆𝑥2 = 𝑅 𝑥1 , 𝑥2 + ∆𝑥2 , … , 𝑥𝑛
𝜕𝑅 𝑅 𝑥1 + ∆𝑥1 − 𝑅 𝑥1
≃
𝜕𝑥1 ∆𝑥1
𝜕𝑅 𝑅 𝑥2 + ∆𝑥2 − 𝑅 𝑥2
≃
𝜕𝑥2 ∆𝑥2
and these values could be inserted in uncertainty equation to calculate the uncertainty
14
Example 2.6 : Remember the Example 2.1. The resistance of a certain size of copper
wire is given as
𝑅 = 𝑅0 1 + 𝛼 𝑇 − 20
where 𝑅0 = 6 Ω ± 0.3 percent is the resistance at 20℃ , 𝛼 = 0.004℃−1 ±
1 percent is the temperature coefficient of resistance, and the temperature of the wire is
𝑇 = 30 ± 1℃. Calculate the uncertainty of the wire resistance in Example 2.1. using
the result-perturbation technique.
Solution :
15
Chapter 2
Analysis of Experimental Data
2.6. Statistical Analysis of Experimental Data
When a set of readings of an instrument is taken, the individual readings will vary
somewhat from each other, and the experimenter may be concerned with the mean of
all the readings. If each reading is denoted by 𝑥𝑖 and there are n readings, the arithmetic
mean is given by
(2.6)
(2.7)
We may note that the average of the deviations of all the readings is zero since
(2.8)
(2.9)
(2.10)
and the square of the standard deviation 𝜎 2 is called the variance. This is sometimes
called the population or biased standard deviation because it strictly applies only when
a large number of samples is taken to describe the population.
In many circumstances the engineer will not be able to collect as many data points as
necessary to describe the underlying population. Generally, it is desired to have at least
20 measurements in order to obtain reliable estimates of standard deviation and general
validity of the data. For small sets of data an unbiased or sample standard deviation is
defined by
(2.11)
There are other kinds of mean values of interest from time to time in statistical
analysis. The median is the value that divides the data points in half. For example, if
measurements made on five production resistors give 10, 12, 13, 14, and 15 kΩ the
median value would be 13 kΩ .The arithmetic mean, however, would be
10 + 12 + 13 + 14 + 15
𝑅𝑚 = = 12.8 𝑘Ω
5
17
Sometimes it is appropriate to use a geometric mean when studying phenomena which
grow in proportion to their size. This would apply to certain biological processes and to
growth rates in financial resources. The geometric mean is defined by
1/𝑛
𝑥𝑔 = 𝑥1 ∙ 𝑥2 ∙ 𝑥3 ⋯ 𝑥𝑛
As an example of the use of this concept, consider the 5-year record of a mutual fund
investment:
1/4
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐺𝑟𝑜𝑤𝑡ℎ = 0.89 1.1124 1.1111 1.1364
= 1.0574
4
1000 1.0574 = 1250
18
Example 2.7 : Calculation Of Population Variables.
The following readings are taken of a certain physical length. Compute the mean
reading, standard deviation, variance, and average of the absolute value of the
deviation, using the “biased” basis:
Reading x, cm
1 5.30
2
Reading 𝑑𝑖 = 𝑥𝑖 − 𝑥𝑚 𝑥𝑖 − 𝑥𝑚 × 102
1
2
3
4
5
6
7
8
9
10
1 𝑛 1/2
𝑆𝑡𝑟𝑎𝑛𝑑𝑎𝑟𝑡 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 → 𝜎 = σ𝑖=1(𝑥𝑖 −𝑥𝑚 )2 =
𝑛
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 → 𝜎 2 =
Average of the absolute value of the deviation:
1
𝑑ഥ𝑖 = σ𝑛𝑖=1 𝑑𝑖 =
𝑛
19
Example 2.8 : Sample Standard Deviation
The following readings are taken of a certain physical length. Calculate the best
estimate of standard deviation based on the “sample” or unbiased basis.
Reading x, cm
The mean value is given by
1 5.30
2 5.73
3 6.77
4 5.26
5 4.33
6 5.45
7 6.09
8 5.64
9 5.81
The other quantities are computed with the 10 5.75
aid of the following table:
2
Reading 𝑑𝑖 = 𝑥𝑖 − 𝑥𝑚 𝑥𝑖 − 𝑥𝑚 × 102
1
2
3
4
5
6
7
8
9
10
1/2
σ𝑛
𝑖=1(𝑥𝑖 −𝑥𝑚 )
2
𝑆𝑡𝑟𝑎𝑛𝑑𝑎𝑟𝑡 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 → 𝜎 = 𝑛−1
20
2.7. Probability Distributions
The probability that one will get a head when flipping an unweighted coin is ½.,
regardless of the number of times the coin is tossed. The probability that a tail will
occur is also ½. The probability that either a head or a tail will occur is ½+½ or unity.
Suppose we toss a horseshoe some distance x. Even though we make an effort to toss
the horseshoe the same distance each time, we would not always meet with success. On
the first toss the horseshoe might travel a distance 𝑥1 , on the second toss a distance of
𝑥2 , and so forth. If one is a good player of the game, there would be more tosses which
have an x distance equal to that of the objective. Since each x distance will vary
somewhat from other x distances, we might find it advantageous to calculate the
probability of a toss landing in a certain increment of x between x and x + Δx
𝑎𝑛 𝑒 −𝑎
𝑝𝑎 𝑛 =
𝑛!
The Poisson distribution may be shown that the standard deviation of the Poisson
distribution is
𝜎= 𝑎
Important Note : In a situation where independent events occur randomly at irregular
intervals, the probability of the occurrence of events is given by the Poisson
distribution.
22
HISTOGRAMS
When a limited number of observations is made and the raw data are plotted, we call
the plot a histogram.
Distance from Target, cm Number of Throws
0-10 5
10-20 15
20-30 13
30-40 11
40-50 9
50-60 8
60-70 10
70-80 6
80-90 7
90-100 5
100-110 5
110-120 3
Figure 2.2. Histogram with Δx = 10 cm. 120-130 2
Total 99
Figure 2.3. Histogram with Δx = 20 cm. Figure 2.4. Cumulative frequency diagram.
23
2.8. The Gaussian or Normal Error Distribution
1 𝑥−𝑥𝑚 2 /2𝜎 2
𝑃 𝑥 = 𝑒−
𝜎 2𝜋
In this expression 𝑥𝑚 is the mean reading and σ is the standard deviation. Some may
prefer to call P(x) the probability density.
1 2 /2𝜎 2
𝑃 𝑥 = 𝑒 − 𝑥−𝑥𝑚
𝜎 2𝜋
1
𝑃 𝑥𝑚 =
𝜎 2𝜋
It is seen from the equation that smaller values of the standard deviation produce larger
values of the maximum probability, as would be expected in an intuitive sense. 𝑃 𝑥𝑚
is sometimes called a measure of precision of the data because it has a larger value for
smaller values of the standard deviation.
The probability that a measurement will fall within a certain range 𝑥1 of the mean
reading is
𝑥𝑚 +𝑥1
1 𝑥−𝑥𝑚 2 /2𝜎 2
𝑃 𝑥 =න 𝑒− 𝑑𝑥
𝑥𝑚 −𝑥1 𝜎 2𝜋
Making the variable substitution
𝑥 − 𝑥𝑚
𝜂=
𝜎
Equation becomes:
+𝜂1
1 2 /2
𝑃= න 𝑒 −𝜂 𝑑𝜂
2𝜋 −𝜂1
where
𝑥1
𝜂1 =
𝜎
Values of the gaussian normal error function
1 2 /2
𝑒 −𝜂
2𝜋
Values of the gaussian normal error distribution given in Table 2.1 and 2.2
25
Table 2.1 Values of the gaussian normal error distribution.
26
Table 2.2 Integrals of the gaussian normal error function.
27
Table 2.3 Chances for deviations from mean value of normal distribution curve.
Chances of Results Falling
Deviation
within Specified Deviation
±0.6745𝜎 1-1
σ 2.15-1
2σ 21-1
3σ 369-1
Example 2.10 : Calculate the probabilities that a measurement will fall within one,
two, and three standard deviations of the mean value and compare them with the values
in Table 3.3.
We perform the calculation using the equation given below with 𝜂1 = 1, 2, and 3. The
values of the integral may be obtained from Table 2.2.
+𝜂1 𝜂1
2 /2 2 /2
න 𝑒 −𝜂 𝑑𝜂 = 2 න 𝑒 −𝜂 𝑑𝜂
−𝜂1 0
28
Example 2.11 : A certain power supply is stated to provide a constant voltage output of
10.0 V within ±0.1 V. The output is assumed to have a normal distribution. Calculate
the probability that a single measurement of voltage will lie between 10.1 and 10.2 V.
29
CHAUVENET’S CRITERION
It is a rare circumstance indeed when an experimenter does not find that some of the
data points look bad and out of place in comparison with the bulk of the data. The
engineer cannot just throw out those points that do not fit with expectations—there
must be some consistent basis for elimination. In this situation, Chauvenet’s criterion is
used.
Suppose n measurements of a quantity are taken and n is large enough that we may
expect the results to follow the gaussian error distribution. It is known as Chauvenet’s
criterion and specifies that a reading may be rejected if the probability of obtaining the
particular deviation from the mean is less than 1/2n. The deviations of the individual
points are then compared with the standard deviation in accordance with the
information in Table 2.4 (or by a direct application of the criterion), and the dubious
points are eliminated.
30
In applying Chauvenet’s criterion to eliminate dubious data points, one first calculates
the mean value and standard deviation using all data points. The deviations of the
individual points are then compared with the standard deviation in accordance with the
information in Table 2.4 (or by a direct application of the criterion), and the dubious
points are eliminated. For the final data presentation a new mean value and standard
deviation are computed with the dubious points eliminated from the calculation.
Example 2.12 : Using Chauvenet’s criterion, test the data points of Example 2.7 for
possible inconsistency. Eliminate the questionable points and calculate a new standard
deviation for the adjusted data.
The best estimate of the standard deviation is given in Example 2.8 as 0.627 cm. We
first calculate the ratio 𝑑𝑖 /𝜎 and eliminate data points in accordance with Table 2.4
Reading x, cm 𝑑𝑖 = 𝒙 𝒊 − 𝒙 𝒎 𝑑𝑖 /𝜎
1 5.30
2 5.73
3 6.77
4 5.26
5 4.33
6 5.45
7 6.09
8 5.64
9 5.81
10 5.75
31
New standard deviation should calculate by using table below.
Reading x, cm 𝑑𝑖 = 𝒙 𝒊 − 𝒙 𝒎
1 5.30 -0.313
2 5.73 0.117
3 6.77 1.157
4 5.26 -0.353
6 5.45 -0.163
7 6.09 0.477
8 5.64 0.027
9 5.81 0.197
10 5.75 0.137
1/2
σ𝑛
𝑖=1(𝑥𝑖 −𝑥𝑚 )
2
𝜎= =
𝑛−1
32
THE CHI-SQUARE TEST
𝑛 2
2
𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖
𝜒 =
𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖
𝑖=1
The chi-square test may be applied to check the validity of various distributions.
Calculations have been made of the probability that the actual measurements match the
expected distribution, and these probabilities are given in Table 2.5. In this table F
represents the number of degrees of freedom in the measurements and is given by
𝐹 =𝑛−𝑘
where n is the number of cells and k is the number of imposed conditions on the
expected distribution.
We might use the test to analyze random errors or to check the adherence of certain
data to an expected distribution. We interpret the test by calculating the number of
degrees of freedom and χ2 from the experimental data. Then, consulting Table 2.5, we
obtain the probability P that this value of χ2 or higher value could occur by chance.
• If χ2 = 0, then the assumed or expected distribution and measured distribution
match exactly.
• The larger the value of χ2, the larger is the disagreement between the assumed
distribution and the observed values, or the smaller the probability that the
observed distribution matches the expected distribution.
33
Table 2.5 Chi-squared. P is the probability that the value in the table will be exceeded for a given number of degrees of freedom
34
A good rule of thumb is that if P lies between 0.1 and 0.9, the observed distribution
may be considered to follow the assumed distribution. If P is either less than 0.02 or
greater than 0.98, the assumed distribution may be considered unlikely.
For the chi-square test the generally accepted minimum number of expected values for
each ith cell is 5. If some frequencies fall below 5, it is recommended that the cells or
groups be redefined to alleviate the problem.
Example 2.13 : A plastics company produces two types of styrofoam cups (call them A
and B) which can experience eight kinds of defects. One hundred defective samples of
each cup are collected and the number of each type of defect is determined. The
following table results:
36
Example 2.14: Two dice are rolled 300 Number Number of Occurrences
times and the following results are noted 2 6
3 9
as given table. Calculate the probability
4 27
that the dice are unloaded. 5 36
6 39
7 57
Eleven cells have been observed with
8 45
only one restriction: the number of rolls 9 39
of the dice is fixed. Thus, F =11−1=10. 10 24
11 12
If the dice are unloaded, a short listing
12 6
of the combinations of the dice will give
the probability of occurrence for each Number Observed Probability Expected
number. The expected value of each 2 6
3 9
number is then the probability
4 27
multiplied by 300, the total number of 5 36
throws. The values of interest are 6 39
7 57
tabulated in second table:
8 45
9 39
10 24
11 12
12 6
37
Chapter 2
Analysis of Experimental Data
2.11. Method of Least Squares
(2.12)
Now, suppose we wish to minimize S with respect to the mean value 𝑥𝑚 . We set
(2.13)
(2.14)
or the mean value which minimizes the sum of the squares of the deviations is the
arithmetic mean. This example might be called the simplest application of the method
of least squares.
38
Suppose that the two variables x and y are measured over a range of values. Suppose
further that we wish to obtain a simple analytical expression for y as a function of x.
The simplest type of function is a linear one; hence, we might try to establish y as a
linear function of x. The problem is one of finding the best linear function, for the data
may scatter a considerable amount. We could solve the problem rather quickly by
plotting the data points on graph paper and drawing a straight line through them by eye.
Indeed this is common practice, but the method of least squares gives a more reliable
way to obtain a better functional relationship than the guesswork of plotting. We seek
an equation of the form
𝑦 = 𝑎𝑥 + 𝑏
We therefore wish to minimize the quantity
𝑛
2
𝑆 = 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏
𝑖=1
This is accomplished by setting the derivatives with respect to a and b equal to zero.
𝑛𝑏 + 𝑎 𝑥𝑖 = 𝑦𝑖
𝑏 𝑥𝑖 + 𝑎 𝑥𝑖2 = 𝑥𝑖 𝑦𝑖
𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑎=
𝑛 σ 𝑥𝑖2 − σ 𝑥𝑖 2
σ 𝑦𝑖 (σ 𝑥𝑖2 ) − σ 𝑥𝑖 𝑦𝑖 σ 𝑥𝑖
𝑏=
𝑛 σ 𝑥𝑖2 − σ 𝑥𝑖 2
39
and the standard error of estimate of y for the data is
σ 𝑦𝑖 − 𝑦ෝ𝑖 2 1/2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑡 𝐸𝑟𝑟𝑜𝑟 =
𝑛−2
σ 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2 1/2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑡 𝐸𝑟𝑟𝑜𝑟 =
𝑛−2
The method of least squares may also be used for determining higher-order
polynomials for fitting data. One only needs to perform additional differentiations to
determine additional constants. For example, if it were desired to obtain a least-squares
fit according to the quadratic function
𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
The quantity
(2.15)
(2.16)
(2.17)
(2.18)
40
Expanding and collecting terms, we have
(2.19)
(2.20)
(2.21)
𝑦𝑖 = 12.6 𝑥𝑖 = 15.2
𝒙𝒊 𝒚𝒊 𝒙𝟐𝒊
41
2.12. The Correlation Coefficient
Assume that a suitable correlation between y and x has been obtained, by either least-
squares analysis or graphical curve fitting. We want to know how good this fit is and
the parameter which conveys this information is the correlation coefficient r defined by
2 1/2
𝜎𝑦,𝑥
𝑟 = 1− 2
𝜎𝑦
σ𝑛𝑖=1 𝑦𝑖 − 𝑦𝑚 2 1/2
𝜎𝑦 =
𝑛−1
The 𝑦𝑖 are the actual values of y, and the 𝑦𝑖𝑐 are the values computed from the
correlation equation for the same value of x.
2
𝜎𝑦2 − 𝜎𝑦,𝑥
2
𝑟 =
𝜎𝑦2
where, now, 𝑟 2 is called the coefficient of determination. We note that for a perfect fit
𝜎𝑦,𝑥 = 0 because there are no deviations between the data and the correlation. In this
case r = 1.0. If 𝜎𝑦 = 𝜎𝑦,𝑥 , we obtain r = 0, indicating a poor fit or substantial scatter
around the fitted line. The reader must be cautioned about ascribing too much virtue to
values of r close to 1.0. These values may occur when the data do not fit the line.
42
A relationship for the correlation coefficient which may be preferable to r equation for
computer calculations is
𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑟=
𝑛 σ 𝑥𝑖2 − σ 𝑥𝑖 2 1/2 𝑛 σ 𝑦𝑖2 − σ 𝑦𝑖 2 1/2
2 1/2
𝑦𝑖 − 𝑦𝑖𝑐
𝜎𝑦,𝑥 =
𝑛−3
1/2
𝑦𝑖 − 𝑦𝑖𝑐 2
𝜎𝑦,𝑥 =
𝑛 − (𝑚 + 1)
In some cases a higher order polynomial may actually provide a poorer correlation than
the simple quadratic. Again, it is a good idea to plot the data first to get a visual idea of
the behavior before performing analyses.
43
Example 2.16 : Calculate the correlation coefficient 𝒚𝒊 𝒙𝒊
for the least-square correlation of Example 2.16. 1.2 1.0
2.0 1.6
2.4 3.4
3.5 4.0
3.5 5.2
𝒚𝒊 𝒚𝒊𝒄 𝟐
i 𝒚𝒊 − 𝒚𝒊𝒄
44
2.13. Graphical Analysis and Curve Fitting
Engineers are well known for their ability to plot many curves of experimental data and
to extract all sorts of significant facts from these curves. The better one understands the
physical phenomena involved in a certain experiment, the better one is able to extract a
wide variety of information from graphical displays of experimental data. Because
these physical phenomena may encompass all engineering science, we cannot discuss
them here except to emphasize that the person who is usually most successful in
analyzing experimental data is the one who understands the physical processes behind
the data.
45
2.14. Choice of Graph Formats
46
Table 3.4. Continue
47
The engineer has many graph formats available for presenting experimental data or
calculation results. While bar charts, column charts, pie charts, and similar types of
displays have some applications, by far the most frequently used display is the x-y
graph with choices of coordinates to match the situation. This basic graph has several
variations in format that we shall illustrate by plotting the simple table of x-y data
shown below.
𝒙 𝒚
1 2
2 3.1
3 12
4 18
5 20
6 37
7 51
8 70
9 82
10 90
a) This display presents just the raw data points with a data marker for each point. It
might be selected as an initial type of display before deciding on a more suitable
alternative. It may be employed for either raw experimental data points or for
points calculated from an analytical relationship.
b) This display presents the points with the same data markers connected by a
smooth curve drawn either by hand or by a computer graphics system; in this case,
by computer. This display should be used with caution. If employed for
presentation of experimental data, it implies that the smooth curve describes the
physical phenomena represented by the data points.
48
c) This display is the same as (b) but with the data markers removed. It would almost
never be employed for presentation of experimental data because the actual data
points are not displayed. It also has the same disadvantage as (b) in the implication
that the physical phenomena are represented by the smooth connecting curve. In
contrast, this type of display is obviously quite suitable for presenting the results
of calculations.
d) This display presents the data points connected with straight-line segments instead
of a smooth curve, and avoids the implication that the physical situation behaves
in a certain “smooth” fashion. The plot is typically employed for calibration curves
where linear interpolation will be used between points, or when a numerical
integration is to be performed based on the connecting straight-line segments.
49
e) The format in (e) is the same as (d) without the data markers. It might be used for
calculation results where the engineer wants to avoid computer smoothing
between the calculated points.
f) Finally, the format presented in (f ) is one that is frequently selected to present
experimental results where uncertainties in the measurements are expected to
result in scatter of the data points. A smooth curve is drawn through the data points
as the experimentalist’s best estimate of the behavior of the phenomena under
study.
50