0% found this document useful (0 votes)
8 views50 pages

Statistical Analysis of Experimental Data - LN

Chapter 2 discusses the analysis of experimental data, emphasizing the importance of understanding errors and uncertainties in measurements. It categorizes errors into design errors, systematic errors, and random errors, and introduces methods for error analysis including commonsense approaches and uncertainty analysis. The chapter also covers statistical analysis of data, including mean, deviations, and standard deviation, highlighting the need for careful measurement and analysis to ensure accurate experimental results.

Uploaded by

turhankeremoglu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views50 pages

Statistical Analysis of Experimental Data - LN

Chapter 2 discusses the analysis of experimental data, emphasizing the importance of understanding errors and uncertainties in measurements. It categorizes errors into design errors, systematic errors, and random errors, and introduces methods for error analysis including commonsense approaches and uncertainty analysis. The chapter also covers statistical analysis of data, including mean, deviations, and standard deviation, highlighting the need for careful measurement and analysis to ensure accurate experimental results.

Uploaded by

turhankeremoglu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Chapter 2

Analysis of Experimental Data


2.1. Introduction

Some form of analysis must be performed on all experimental data. The analysis may
be a simple verbal appraisal of the test results, or it may take the form of a complex
theoretical analysis of the errors involved in the experiment and matching of the data
with fundamental physical principles.

The experimentalist should always know the validity of data. The automobile test
engineer must know the accuracy of the speedometer and gas gage in order to express
the fuel-economy performance with confidence. A nuclear engineer must know the
accuracy and precision of many instruments just to make some simple radioactivity
measurements with confidence.

Errors will creep into all experiments regardless of the care exerted. Some of these
errors are of a random nature, and some will be due to gross blunders on the part of the
experimenter. It is not true strategy when an unexpected results obtained while
experimenting.

2.2. Causes and Types of Experimental Errors

In this section we present a discussion of some of the types of errors that may be
present in experimental data and begin to indicate the way these data may be handled.

Types of Errors : At this point, we will see the types of errors that cause uncertainty in
experimental results. Generally, experimental errors can be classified into three groups.

1
1. The first group includes errors that arise from mistakes made during the design or
construction of the experimental setup. These errors can be due to carelessness or
inexperience. Errors resulting from the incorrect selection of measurement
instruments or the incorrect design of measurement systems also fall into this
group. Careful experimental design and execution can help reduce these types of
errors. For example, if a measurement device shows a decrease in temperature
when it is supposed to be measuring the temperature of an object, then it is
definitely wrong, and the measurement should be discarded.

2. The second type of error is a constant (systematic) error that is repeated in


measurements, but the cause is unknown. This constant error is also referred to as
a systematic error or bias error. Sequential measurements consistently contain the
same amount or direction of error. It may be due to calibration or the design of the
experimental system.

3. The third type of error is random error that results from changes in the
experimental team, a decrease in attention over time, electronic measurement
device oscillations resulting from device warming, or fluctuations in electrical
current. Some of the readings may be inaccurate due to random errors that occur
due to vibrations or other factors in the environment.

After a certain number of experiments, several methods have been developed for
determining the error rates associated with the experiment. One of the most commonly
used methods is the "commonsense approach" or logical error, and the other is
"uncertainty analysis." The most preferred method is uncertainty analysis.

2
2.3. Error Analysis on a Commonsense Basis

After calculating experimental results, it is necessary to determine the uncertainty in


the fundamental measurements. One way to do this is through logical analysis. In this
type of error analysis, it is assumed that all instruments in the measurement system
make the maximum error simultaneously. Consider the calculation of an electrical
power.

𝑃=𝐸𝐼
where E and I are measured as
𝐸 = 100 𝑉 ± 2 𝑉
𝐼 = 10 𝐴 ± 0.2 𝐴

The nominal value of the power is 100×10 = 1000W. By taking the worst possible
variations in voltage and current, we could calculate

𝑃𝑚𝑎𝑥 = 100 + 2 10 + 0.2 = 1040.4 𝑊


𝑃𝑚𝑖𝑛 = 100 − 2 10 − 0.2 = 960.4 𝑊

the uncertainty in the power can be calculated as:

1040.4 − 1000
× 100 = 4.04%
1000
960.4 − 1000
× 100 = −3.96%
1000

Here, we have found the error percentage taking into account the worst-case scenarios.
However, there is no guarantee that the voltmeter will give the highest value while the
amperemeter will give the highest value as well. Therefore, the probability of the
voltmeter and the amperemeter giving the highest or lowest values simultaneously is
low. Thus, more analysis is needed to determine the error more accurately. 3
2.4. Uncertainty Analysis and Propagation of Uncertainty

The method is based on a careful specification of the uncertainties in the various


primary experimental measurements. For example, a certain pressure reading might be
expressed as
𝑝 = 100 𝑘𝑃𝑎 ± 1 𝑘𝑃𝑎

When the plus or minus notation is used to designate the uncertainty, the person
making this designation is stating the degree of accuracy with which he or she believes
the measurement has been made.

If a very careful calibration of an instrument has been performed recently with


standards of very high precision, then the experimentalist will be justified in assigning
a much lower uncertainty to measurement.

𝑝 = 100 𝑘𝑃𝑎 ± 1 𝑘𝑃𝑎 (20 to 1)

The experimenter is willing to bet with 20 to 1 odds that the pressure measurement is
within ±1 kPa.

The result R is a given function of the independent variables 𝑥1 , 𝑥2 , 𝑥3 , … . 𝑥𝑛 . Thus,

(2.1)

4
Let 𝑤𝑅 be the uncertainty in the result and 𝑤1 , 𝑤2 , . . . , 𝑤𝑛 be the uncertainties in the
independent variables.

(2.2)

Example 2.0 : Consider the calculation of an electrical power.

𝑃=𝐸𝐼
where E and I are measured as
𝐸 = 100 𝑉 ± 2 𝑉
𝐼 = 10 𝐴 ± 0.2 𝐴
Determine the error in P.

5
Uncertainties for Product Functions
In many cases the result function of the error function takes the form of a product of
the respective primary variables raised to exponents and expressed as

𝑎 𝑎 𝑎
𝑅 = (𝑥1 1 , 𝑥2 2 , … . , 𝑥𝑛 𝑛 )

𝜕𝑅 𝑎 𝑎 𝑎 −1 𝑎
= 𝑥1 1 𝑥2 2 (𝑎𝑖 𝑥𝑖 𝑖 ) ⋯ 𝑥𝑛 𝑛
𝜕𝑥𝑖
Dividing by R

1 𝜕𝑅 𝑎𝑖
=
𝑅 𝜕𝑥𝑖 𝑥𝑖
Inserting the error function this equation

(2.3)

Uncertainties for Additive Functions


When the result function has an additive form, R will be expressed as

𝑅 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + ⋯ + 𝑎𝑛 𝑥𝑛 = ෍ 𝑎𝑖 𝑥𝑖

𝜕𝑅
= 𝑎𝑖
𝜕𝑥𝑖
Inserting the error function this equation

(2.4)

(2.5)

6
Example 2.1 : Uncertainty Of Resistance Of A Copper Wire
The resistance of a certain size of copper wire is given as
𝑅 = 𝑅0 1 + 𝛼 𝑇 − 20
where 𝑅0 = 6 Ω ± 0.3 percent is the resistance at 20℃ , 𝛼 = 0.004℃−1 ±
1 percent is the temperature coefficient of resistance, and the temperature of the wire is
𝑇 = 30 ± 1℃. Calculate the resistance of the wire and its uncertainty.
Solution :

7
Example 2.2 : Uncertainty In Power Measurement
The two resistors R and 𝑅𝑠 are connected in. The
voltage drops across each resistor are measured as

𝐸 = 10 𝑉 ± 0.1𝑉 1%
𝐸𝑠 = 1.2 𝑉 ± 0.005 𝑉 (0.467%)
1
𝑅𝑠 = 0.0066 Ω ± %
4
From these measurements determine the power dissipated in resistor R and its
uncertainty.
Solution :

8
Example 2.3 : Selection Of Measurement Method
A resistor has a nominal stated value of 10 Ω ± 1 percent. A
voltage is impressed on the resistor, and the power
dissipation is to be calculated in two different ways:
(1) from 𝑃 = 𝐸 2 /𝑅 and (2) from P = EI. In (1) only a voltage measurement will be
made, while both current and voltage will be measured in (2). Calculate the
uncertainty in the power determination in each case when the measured values of E
and I are
𝐸 = 100 𝑉 ± 1% for both cases
𝐼 = 10 𝐴 ± 1%
Solution :

9
Example 2.4 : Instrument Selection
The power measurement in previous example is to be
conducted by measuring voltage and current across the
resistor with he circuit shown figure.
The voltmeter has an internal resistance 𝑅𝑚 , and the value of R is known only
approximately. Calculate the nominal value of the power dissipated in R and the
uncertainty for the following conditions:
𝑅 = 100Ω (not known exactly)
𝑅𝑚 = 1000Ω ± 5%
𝐼 = 5𝐴 ± 1%
𝐸 = 500𝑉 ± 1%
Solution :

10
In terms of known quantities the power has the functional form 𝑃 = 𝑓(𝐸, 𝐼, 𝑅𝑚 ), and
so we form the derivatives

The uncertainty for the power is now written as

11
Example 2.5 : Ways To Reduce Uncertainties
A certain obstruction-type flowmeter, shown in the
figure, is used to measure the flow of air at low
velocities. The relation describing the flow rate is

1/2
2𝑔𝑐 𝑝1
𝑚ሶ = 𝐶𝐴 (𝑝1 − 𝑝2 )
𝑅𝑇1

C = empirical-discharge coefficient
A = flow area
𝑝1 and 𝑝2 = upstream and downstream pressures, respectively
𝑇1 = upstream temperature
R = gas constant for air
Calculate the percent uncertainty in the mass flow rate for the following conditions
𝐶 = 0.92 ± 0.005 (from the calibration data)
𝑝1 = 25 𝑝𝑠𝑖𝑎 ± 0.5 𝑝𝑠𝑖𝑎
𝑇1 = 70℉ ± 2℉ 𝑇1 = 530°𝑅
∆𝑝 = 𝑝1 − 𝑝2 = 1.4 𝑝𝑠𝑖𝑎 ± 0.005 𝑝𝑠𝑖𝑎 (measured directly)
𝐴 = 1.0 𝑖𝑛2 ± 0.001 𝑖𝑛2
Solution :

12
The main contribution to uncertainty is the 𝑝1 measurement with its basic uncertainty
of 2 percent. Thus, to improve the overall situation the accuracy of this measurement
should be attacked first.

13
2.5. Evaluation of Uncertainties for Complicated Data Reduction

We have seen in the preceding discussion and examples how uncertainty analysis can
be a useful tool to examine experimental data. In many cases data reduction is a rather
complicated affair and is often performed with a computer routine written specifically
for the task.
𝑅 𝑥1 = 𝑅 𝑥1 , 𝑥2 , … , 𝑥𝑛
𝑅 𝑥1 + ∆𝑥1 = 𝑅 𝑥1 + ∆𝑥1 , 𝑥2 , … , 𝑥𝑛
𝑅 𝑥2 = 𝑅 𝑥1 , 𝑥2 , … , 𝑥𝑛
𝑅 𝑥2 + ∆𝑥2 = 𝑅 𝑥1 , 𝑥2 + ∆𝑥2 , … , 𝑥𝑛

For small enough values of ∆𝑥 the partial derivatives can be approximated by

𝜕𝑅 𝑅 𝑥1 + ∆𝑥1 − 𝑅 𝑥1

𝜕𝑥1 ∆𝑥1

𝜕𝑅 𝑅 𝑥2 + ∆𝑥2 − 𝑅 𝑥2

𝜕𝑥2 ∆𝑥2

and these values could be inserted in uncertainty equation to calculate the uncertainty

14
Example 2.6 : Remember the Example 2.1. The resistance of a certain size of copper
wire is given as
𝑅 = 𝑅0 1 + 𝛼 𝑇 − 20
where 𝑅0 = 6 Ω ± 0.3 percent is the resistance at 20℃ , 𝛼 = 0.004℃−1 ±
1 percent is the temperature coefficient of resistance, and the temperature of the wire is
𝑇 = 30 ± 1℃. Calculate the uncertainty of the wire resistance in Example 2.1. using
the result-perturbation technique.
Solution :

15
Chapter 2
Analysis of Experimental Data
2.6. Statistical Analysis of Experimental Data

First, it is important to define some pertinent terms before starting to analysis of


experimental data..

When a set of readings of an instrument is taken, the individual readings will vary
somewhat from each other, and the experimenter may be concerned with the mean of
all the readings. If each reading is denoted by 𝑥𝑖 and there are n readings, the arithmetic
mean is given by

(2.6)

The deviation 𝑑𝑖 for each reading is defined by

(2.7)

We may note that the average of the deviations of all the readings is zero since

(2.8)

The average of the absolute values of the deviations is given by

(2.9)

Note that this quantity is not necessarily zero. 16


The standard deviation or root-mean-square deviation is defined by

(2.10)

and the square of the standard deviation 𝜎 2 is called the variance. This is sometimes
called the population or biased standard deviation because it strictly applies only when
a large number of samples is taken to describe the population.

In many circumstances the engineer will not be able to collect as many data points as
necessary to describe the underlying population. Generally, it is desired to have at least
20 measurements in order to obtain reliable estimates of standard deviation and general
validity of the data. For small sets of data an unbiased or sample standard deviation is
defined by

(2.11)

There are other kinds of mean values of interest from time to time in statistical
analysis. The median is the value that divides the data points in half. For example, if
measurements made on five production resistors give 10, 12, 13, 14, and 15 kΩ the
median value would be 13 kΩ .The arithmetic mean, however, would be

10 + 12 + 13 + 14 + 15
𝑅𝑚 = = 12.8 𝑘Ω
5

17
Sometimes it is appropriate to use a geometric mean when studying phenomena which
grow in proportion to their size. This would apply to certain biological processes and to
growth rates in financial resources. The geometric mean is defined by
1/𝑛
𝑥𝑔 = 𝑥1 ∙ 𝑥2 ∙ 𝑥3 ⋯ 𝑥𝑛

As an example of the use of this concept, consider the 5-year record of a mutual fund
investment:

Year Asset Value (TL) Rate of Increase over Previous Year


1 1000
2 890 0.89
3 990 1.1124
4 1100 1.1111
5 1250 1.1364

The average growth rate is therefore

1/4
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐺𝑟𝑜𝑤𝑡ℎ = 0.89 1.1124 1.1111 1.1364
= 1.0574

This is indeed a valid average growth rate, we can observe that

4
1000 1.0574 = 1250

18
Example 2.7 : Calculation Of Population Variables.
The following readings are taken of a certain physical length. Compute the mean
reading, standard deviation, variance, and average of the absolute value of the
deviation, using the “biased” basis:

Reading x, cm
1 5.30

The mean value is given by 2 5.73


3 6.77
4 5.26
5 4.33
6 5.45
7 6.09
8 5.64
The other quantities are computed with the 9 5.81

aid of the following table: 10 5.75

2
Reading 𝑑𝑖 = 𝑥𝑖 − 𝑥𝑚 𝑥𝑖 − 𝑥𝑚 × 102
1
2
3
4
5
6
7
8
9
10

1 𝑛 1/2
𝑆𝑡𝑟𝑎𝑛𝑑𝑎𝑟𝑡 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 → 𝜎 = σ𝑖=1(𝑥𝑖 −𝑥𝑚 )2 =
𝑛

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 → 𝜎 2 =
Average of the absolute value of the deviation:
1
𝑑ഥ𝑖 = σ𝑛𝑖=1 𝑑𝑖 =
𝑛

19
Example 2.8 : Sample Standard Deviation
The following readings are taken of a certain physical length. Calculate the best
estimate of standard deviation based on the “sample” or unbiased basis.

Reading x, cm
The mean value is given by
1 5.30
2 5.73
3 6.77
4 5.26
5 4.33
6 5.45
7 6.09
8 5.64
9 5.81
The other quantities are computed with the 10 5.75
aid of the following table:
2
Reading 𝑑𝑖 = 𝑥𝑖 − 𝑥𝑚 𝑥𝑖 − 𝑥𝑚 × 102
1
2
3
4
5
6
7
8
9
10

1/2
σ𝑛
𝑖=1(𝑥𝑖 −𝑥𝑚 )
2
𝑆𝑡𝑟𝑎𝑛𝑑𝑎𝑟𝑡 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 → 𝜎 = 𝑛−1

20
2.7. Probability Distributions

The probability that one will get a head when flipping an unweighted coin is ½.,
regardless of the number of times the coin is tossed. The probability that a tail will
occur is also ½. The probability that either a head or a tail will occur is ½+½ or unity.

Probability is a mathematical quantity that is linked to the frequency with which a


certain phenomenon occurs after a large number of tries. In the case of the coin, it is the
number of times heads would be expected to result in a large number of tosses divided
by the total number of tosses.

Suppose we toss a horseshoe some distance x. Even though we make an effort to toss
the horseshoe the same distance each time, we would not always meet with success. On
the first toss the horseshoe might travel a distance 𝑥1 , on the second toss a distance of
𝑥2 , and so forth. If one is a good player of the game, there would be more tosses which
have an x distance equal to that of the objective. Since each x distance will vary
somewhat from other x distances, we might find it advantageous to calculate the
probability of a toss landing in a certain increment of x between x and x + Δx

Figure 2.1 shows how the probability of


success in a certain event is distributed
over the distance x. Each value of the
ordinate p(x) gives the probability that
the horseshoe will land between x and
x+Δx, where x is allowed to approach
zero. We might consider the deviation
from 𝑥𝑚 as the error in the throw.

Figure 2.1. Distribution of throws for a good


horseshoes player. 21
𝑁!
𝑝 𝑛 = 𝑝𝑛 1 − 𝑝 𝑁−𝑛
𝑁 − 𝑛 ! 𝑛!
The limit of the binomial distribution as N →∞ and p → 0 such that
𝑁𝑝 = 𝑎 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
is called the Poisson distribution and is given by

𝑎𝑛 𝑒 −𝑎
𝑝𝑎 𝑛 =
𝑛!
The Poisson distribution may be shown that the standard deviation of the Poisson
distribution is
𝜎= 𝑎
Important Note : In a situation where independent events occur randomly at irregular
intervals, the probability of the occurrence of events is given by the Poisson
distribution.

Example 2.9 : Tossing A Coin—Binomial Distribution


An unweighted coin is flipped three times. Calculate the probability of getting zero,
one, two, or three heads in these tosses.
The probability of getting a head on each throw is p = ½ and N = 3, while n takes on
the values 0, 1, 2, and 3.

22
HISTOGRAMS
When a limited number of observations is made and the raw data are plotted, we call
the plot a histogram.
Distance from Target, cm Number of Throws
0-10 5
10-20 15
20-30 13
30-40 11
40-50 9
50-60 8
60-70 10
70-80 6
80-90 7
90-100 5
100-110 5
110-120 3
Figure 2.2. Histogram with Δx = 10 cm. 120-130 2
Total 99

A cumulative frequency diagram could be


employed for these data.

Figure 2.3. Histogram with Δx = 20 cm. Figure 2.4. Cumulative frequency diagram.

23
2.8. The Gaussian or Normal Error Distribution

Suppose an experimental observation is made and some particular result is recorded.


We know that the observation has been subjected to many random errors. These
random errors may make the final reading either too large or too small, depending on
many circumstances which are unknown to us. Assuming that there are many small
errors that contribute to the final error and that each small error is of equal magnitude
and equally likely to be positive or negative, the gaussian or normal error distribution
may be derived. If the measurement is designated by x, the gaussian distribution gives
the probability that the measurement will lie between x and x + dx and is written

1 𝑥−𝑥𝑚 2 /2𝜎 2
𝑃 𝑥 = 𝑒−
𝜎 2𝜋
In this expression 𝑥𝑚 is the mean reading and σ is the standard deviation. Some may
prefer to call P(x) the probability density.

Figure 3.5 shows that the most probable


reading is 𝑥𝑚 . The standard deviation is
a measure of the width of the
distribution curve; the larger the value
of σ, the flatter the curve and hence the
larger the expected error of all the
measurements. If the P(x) formulation is
normalized :
+∞
න 𝑃 𝑥 𝑑𝑥 = 1.0
−∞

Figure 2.5. The gaussian or normal error


distribution for two values of the
standard deviation.
24
The maximum probability occurs at x = xm, and the value of this probability is

1 2 /2𝜎 2
𝑃 𝑥 = 𝑒 − 𝑥−𝑥𝑚
𝜎 2𝜋

1
𝑃 𝑥𝑚 =
𝜎 2𝜋
It is seen from the equation that smaller values of the standard deviation produce larger
values of the maximum probability, as would be expected in an intuitive sense. 𝑃 𝑥𝑚
is sometimes called a measure of precision of the data because it has a larger value for
smaller values of the standard deviation.

The probability that a measurement will fall within a certain range 𝑥1 of the mean
reading is

𝑥𝑚 +𝑥1
1 𝑥−𝑥𝑚 2 /2𝜎 2
𝑃 𝑥 =න 𝑒− 𝑑𝑥
𝑥𝑚 −𝑥1 𝜎 2𝜋
Making the variable substitution
𝑥 − 𝑥𝑚
𝜂=
𝜎
Equation becomes:

+𝜂1
1 2 /2
𝑃= න 𝑒 −𝜂 𝑑𝜂
2𝜋 −𝜂1

where
𝑥1
𝜂1 =
𝜎
Values of the gaussian normal error function

1 2 /2
𝑒 −𝜂
2𝜋
Values of the gaussian normal error distribution given in Table 2.1 and 2.2

25
Table 2.1 Values of the gaussian normal error distribution.

26
Table 2.2 Integrals of the gaussian normal error function.

27
Table 2.3 Chances for deviations from mean value of normal distribution curve.
Chances of Results Falling
Deviation
within Specified Deviation
±0.6745𝜎 1-1
σ 2.15-1
2σ 21-1
3σ 369-1

Example 2.10 : Calculate the probabilities that a measurement will fall within one,
two, and three standard deviations of the mean value and compare them with the values
in Table 3.3.

We perform the calculation using the equation given below with 𝜂1 = 1, 2, and 3. The
values of the integral may be obtained from Table 2.2.

+𝜂1 𝜂1
2 /2 2 /2
න 𝑒 −𝜂 𝑑𝜂 = 2 න 𝑒 −𝜂 𝑑𝜂
−𝜂1 0

28
Example 2.11 : A certain power supply is stated to provide a constant voltage output of
10.0 V within ±0.1 V. The output is assumed to have a normal distribution. Calculate
the probability that a single measurement of voltage will lie between 10.1 and 10.2 V.

29
CHAUVENET’S CRITERION
It is a rare circumstance indeed when an experimenter does not find that some of the
data points look bad and out of place in comparison with the bulk of the data. The
engineer cannot just throw out those points that do not fit with expectations—there
must be some consistent basis for elimination. In this situation, Chauvenet’s criterion is
used.

Suppose n measurements of a quantity are taken and n is large enough that we may
expect the results to follow the gaussian error distribution. It is known as Chauvenet’s
criterion and specifies that a reading may be rejected if the probability of obtaining the
particular deviation from the mean is less than 1/2n. The deviations of the individual
points are then compared with the standard deviation in accordance with the
information in Table 2.4 (or by a direct application of the criterion), and the dubious
points are eliminated.

Table 2.4 Chauvenet’s criterion for rejecting a reading.

Number of Readings Ratio of Maximum Acceptable Deviation


𝑑
n to Standard Deviation, 𝑚𝑎𝑥
𝜎
3 1.38
4 1.54
5 1.65
6 1.73
7 1.80
10 1.96
15 2.13
25 2.33
50 2.57
100 2.81
300 3.14
500 3.29
1000 3.48

30
In applying Chauvenet’s criterion to eliminate dubious data points, one first calculates
the mean value and standard deviation using all data points. The deviations of the
individual points are then compared with the standard deviation in accordance with the
information in Table 2.4 (or by a direct application of the criterion), and the dubious
points are eliminated. For the final data presentation a new mean value and standard
deviation are computed with the dubious points eliminated from the calculation.

Example 2.12 : Using Chauvenet’s criterion, test the data points of Example 2.7 for
possible inconsistency. Eliminate the questionable points and calculate a new standard
deviation for the adjusted data.

The best estimate of the standard deviation is given in Example 2.8 as 0.627 cm. We
first calculate the ratio 𝑑𝑖 /𝜎 and eliminate data points in accordance with Table 2.4

Reading x, cm 𝑑𝑖 = 𝒙 𝒊 − 𝒙 𝒎 𝑑𝑖 /𝜎
1 5.30
2 5.73
3 6.77
4 5.26
5 4.33
6 5.45
7 6.09
8 5.64
9 5.81
10 5.75

31
New standard deviation should calculate by using table below.

Reading x, cm 𝑑𝑖 = 𝒙 𝒊 − 𝒙 𝒎
1 5.30 -0.313
2 5.73 0.117
3 6.77 1.157
4 5.26 -0.353
6 5.45 -0.163
7 6.09 0.477
8 5.64 0.027
9 5.81 0.197
10 5.75 0.137

1/2
σ𝑛
𝑖=1(𝑥𝑖 −𝑥𝑚 )
2
𝜎= =
𝑛−1

32
THE CHI-SQUARE TEST

In general, we may ask how we can determine if experimental observations match


some particular expected distribution for the data.

𝑛 2
2
𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖
𝜒 =෍
𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑖
𝑖=1

The chi-square test may be applied to check the validity of various distributions.
Calculations have been made of the probability that the actual measurements match the
expected distribution, and these probabilities are given in Table 2.5. In this table F
represents the number of degrees of freedom in the measurements and is given by
𝐹 =𝑛−𝑘
where n is the number of cells and k is the number of imposed conditions on the
expected distribution.
We might use the test to analyze random errors or to check the adherence of certain
data to an expected distribution. We interpret the test by calculating the number of
degrees of freedom and χ2 from the experimental data. Then, consulting Table 2.5, we
obtain the probability P that this value of χ2 or higher value could occur by chance.
• If χ2 = 0, then the assumed or expected distribution and measured distribution
match exactly.
• The larger the value of χ2, the larger is the disagreement between the assumed
distribution and the observed values, or the smaller the probability that the
observed distribution matches the expected distribution.

33
Table 2.5 Chi-squared. P is the probability that the value in the table will be exceeded for a given number of degrees of freedom

34
A good rule of thumb is that if P lies between 0.1 and 0.9, the observed distribution
may be considered to follow the assumed distribution. If P is either less than 0.02 or
greater than 0.98, the assumed distribution may be considered unlikely.

For the chi-square test the generally accepted minimum number of expected values for
each ith cell is 5. If some frequencies fall below 5, it is recommended that the cells or
groups be redefined to alleviate the problem.

Example 2.13 : A plastics company produces two types of styrofoam cups (call them A
and B) which can experience eight kinds of defects. One hundred defective samples of
each cup are collected and the number of each type of defect is determined. The
following table results:

We would like to know if the two cups Type


Cup A Cup B
Defect
have the same pattern of defects. To do 1 1 5
this, we could compute chi-squared for 2 2 3
3 3 3
cup B assuming cup A has the expected
4 25 23
distribution. But we encounter a 5 10 12
problem. Defects 1, 2, and 3 do not meet 6 15 16
7 38 30
our criterion of a minimum of five
8 6 8
expected values in each cell. So, we Total 100 100
must reconstruct the cells by combining
1, 2, and 3 to obtain:
Type
Cup A Cup B
Defect
1, 2 ,3 6 11
4 25 23
5 10 12
6 15 16
7 38 30
8 6 8
Total 100 100
35
For the former case we had eight cells Type
Cup A Cup B
Defect
or groups and one imposed condition 1, 2 ,3 6 11
(total observations =100), so F =8−1=7. 4 25 23
5 10 12
After grouping defects 1, 2, and 3, we
6 15 16
have F =6−1=5. Using this new 7 38 30
tabulation the value of chi-squared is 8 6 8
Total 100 100
calculated as

36
Example 2.14: Two dice are rolled 300 Number Number of Occurrences
times and the following results are noted 2 6
3 9
as given table. Calculate the probability
4 27
that the dice are unloaded. 5 36
6 39
7 57
Eleven cells have been observed with
8 45
only one restriction: the number of rolls 9 39
of the dice is fixed. Thus, F =11−1=10. 10 24
11 12
If the dice are unloaded, a short listing
12 6
of the combinations of the dice will give
the probability of occurrence for each Number Observed Probability Expected
number. The expected value of each 2 6
3 9
number is then the probability
4 27
multiplied by 300, the total number of 5 36
throws. The values of interest are 6 39
7 57
tabulated in second table:
8 45
9 39
10 24
11 12
12 6

37
Chapter 2
Analysis of Experimental Data
2.11. Method of Least Squares

Suppose we have a set of observations 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 . The sum of the squares of their


deviations from some mean value is

(2.12)

Now, suppose we wish to minimize S with respect to the mean value 𝑥𝑚 . We set

(2.13)

where n is the number of observations. We find that

(2.14)

or the mean value which minimizes the sum of the squares of the deviations is the
arithmetic mean. This example might be called the simplest application of the method
of least squares.

38
Suppose that the two variables x and y are measured over a range of values. Suppose
further that we wish to obtain a simple analytical expression for y as a function of x.
The simplest type of function is a linear one; hence, we might try to establish y as a
linear function of x. The problem is one of finding the best linear function, for the data
may scatter a considerable amount. We could solve the problem rather quickly by
plotting the data points on graph paper and drawing a straight line through them by eye.
Indeed this is common practice, but the method of least squares gives a more reliable
way to obtain a better functional relationship than the guesswork of plotting. We seek
an equation of the form
𝑦 = 𝑎𝑥 + 𝑏
We therefore wish to minimize the quantity

𝑛
2
𝑆 = ෍ 𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏
𝑖=1
This is accomplished by setting the derivatives with respect to a and b equal to zero.

𝑛𝑏 + 𝑎 ෍ 𝑥𝑖 = ෍ 𝑦𝑖

𝑏 ෍ 𝑥𝑖 + 𝑎 ෍ 𝑥𝑖2 = ෍ 𝑥𝑖 𝑦𝑖

Solving these equations simultaneously gives

𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑎=
𝑛 σ 𝑥𝑖2 − σ 𝑥𝑖 2

σ 𝑦𝑖 (σ 𝑥𝑖2 ) − σ 𝑥𝑖 𝑦𝑖 σ 𝑥𝑖
𝑏=
𝑛 σ 𝑥𝑖2 − σ 𝑥𝑖 2

Designating the computed value of y as 𝑦,


ො we have
𝑦ො = 𝑎𝑥 + 𝑏

39
and the standard error of estimate of y for the data is

σ 𝑦𝑖 − 𝑦ෝ𝑖 2 1/2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑡 𝐸𝑟𝑟𝑜𝑟 =
𝑛−2

σ 𝑦𝑖 − 𝑎𝑥𝑖 − 𝑏 2 1/2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑡 𝐸𝑟𝑟𝑜𝑟 =
𝑛−2

The method of least squares may also be used for determining higher-order
polynomials for fitting data. One only needs to perform additional differentiations to
determine additional constants. For example, if it were desired to obtain a least-squares
fit according to the quadratic function
𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
The quantity

(2.15)

would be minimized by setting the following derivatives equal to zero:

(2.16)

(2.17)

(2.18)

40
Expanding and collecting terms, we have

(2.19)

(2.20)

(2.21)

These equations may then be solved for the constants a, b, and c.


Example 2.15 : From the following data obtain y as a 𝒚𝒊 𝒙𝒊
linear function of x using the method of least squares: 1.2 1.0
2.0 1.6
2.4 3.4
3.5 4.0
3.5 5.2

෍ 𝑦𝑖 = 12.6 ෍ 𝑥𝑖 = 15.2

𝒙𝒊 𝒚𝒊 𝒙𝟐𝒊

41
2.12. The Correlation Coefficient
Assume that a suitable correlation between y and x has been obtained, by either least-
squares analysis or graphical curve fitting. We want to know how good this fit is and
the parameter which conveys this information is the correlation coefficient r defined by

2 1/2
𝜎𝑦,𝑥
𝑟 = 1− 2
𝜎𝑦

where 𝜎𝑦 is the standard deviation of y given as

σ𝑛𝑖=1 𝑦𝑖 − 𝑦𝑚 2 1/2
𝜎𝑦 =
𝑛−1

σ𝑛𝑖=1 𝑦𝑖 − 𝑦𝑖𝑐 2 1/2


𝜎𝑦,𝑥 =
𝑛−2

The 𝑦𝑖 are the actual values of y, and the 𝑦𝑖𝑐 are the values computed from the
correlation equation for the same value of x.

2
𝜎𝑦2 − 𝜎𝑦,𝑥
2
𝑟 =
𝜎𝑦2

where, now, 𝑟 2 is called the coefficient of determination. We note that for a perfect fit
𝜎𝑦,𝑥 = 0 because there are no deviations between the data and the correlation. In this
case r = 1.0. If 𝜎𝑦 = 𝜎𝑦,𝑥 , we obtain r = 0, indicating a poor fit or substantial scatter
around the fitted line. The reader must be cautioned about ascribing too much virtue to
values of r close to 1.0. These values may occur when the data do not fit the line.

42
A relationship for the correlation coefficient which may be preferable to r equation for
computer calculations is

𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑟=
𝑛 σ 𝑥𝑖2 − σ 𝑥𝑖 2 1/2 𝑛 σ 𝑦𝑖2 − σ 𝑦𝑖 2 1/2

2 1/2
𝑦𝑖 − 𝑦𝑖𝑐
𝜎𝑦,𝑥 =
𝑛−3

In general, for fit with a polynomial of order m one would obtain

1/2
𝑦𝑖 − 𝑦𝑖𝑐 2
𝜎𝑦,𝑥 =
𝑛 − (𝑚 + 1)

In some cases a higher order polynomial may actually provide a poorer correlation than
the simple quadratic. Again, it is a good idea to plot the data first to get a visual idea of
the behavior before performing analyses.

43
Example 2.16 : Calculate the correlation coefficient 𝒚𝒊 𝒙𝒊
for the least-square correlation of Example 2.16. 1.2 1.0
2.0 1.6
2.4 3.4
3.5 4.0
3.5 5.2

𝒚𝒊 𝒚𝒊𝒄 𝟐
i 𝒚𝒊 − 𝒚𝒊𝒄

44
2.13. Graphical Analysis and Curve Fitting
Engineers are well known for their ability to plot many curves of experimental data and
to extract all sorts of significant facts from these curves. The better one understands the
physical phenomena involved in a certain experiment, the better one is able to extract a
wide variety of information from graphical displays of experimental data. Because
these physical phenomena may encompass all engineering science, we cannot discuss
them here except to emphasize that the person who is usually most successful in
analyzing experimental data is the one who understands the physical processes behind
the data.

45
2.14. Choice of Graph Formats

Table 3.4. Methods of plotting various functions to obtain straight lines

46
Table 3.4. Continue

47
The engineer has many graph formats available for presenting experimental data or
calculation results. While bar charts, column charts, pie charts, and similar types of
displays have some applications, by far the most frequently used display is the x-y
graph with choices of coordinates to match the situation. This basic graph has several
variations in format that we shall illustrate by plotting the simple table of x-y data
shown below.
𝒙 𝒚
1 2
2 3.1
3 12
4 18
5 20
6 37
7 51
8 70
9 82
10 90

a) This display presents just the raw data points with a data marker for each point. It
might be selected as an initial type of display before deciding on a more suitable
alternative. It may be employed for either raw experimental data points or for
points calculated from an analytical relationship.

b) This display presents the points with the same data markers connected by a
smooth curve drawn either by hand or by a computer graphics system; in this case,
by computer. This display should be used with caution. If employed for
presentation of experimental data, it implies that the smooth curve describes the
physical phenomena represented by the data points.

48
c) This display is the same as (b) but with the data markers removed. It would almost
never be employed for presentation of experimental data because the actual data
points are not displayed. It also has the same disadvantage as (b) in the implication
that the physical phenomena are represented by the smooth connecting curve. In
contrast, this type of display is obviously quite suitable for presenting the results
of calculations.
d) This display presents the data points connected with straight-line segments instead
of a smooth curve, and avoids the implication that the physical situation behaves
in a certain “smooth” fashion. The plot is typically employed for calibration curves
where linear interpolation will be used between points, or when a numerical
integration is to be performed based on the connecting straight-line segments.

49
e) The format in (e) is the same as (d) without the data markers. It might be used for
calculation results where the engineer wants to avoid computer smoothing
between the calculated points.
f) Finally, the format presented in (f ) is one that is frequently selected to present
experimental results where uncertainties in the measurements are expected to
result in scatter of the data points. A smooth curve is drawn through the data points
as the experimentalist’s best estimate of the behavior of the phenomena under
study.

50

You might also like