Unit 4b - Normal Distribution

Normal Distribution

Uploaded by

Dr Sanjeev Tyagi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

45 views59 pages

Unit 4b - Normal Distribution

Normal Distribution

Uploaded by

Dr Sanjeev Tyagi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 59

oo — INTRODUCTION & In general, measuring instruments are associated with ‘a number of factors causing random errors, Therefore, the instrument readings exhibit a dispersion/scatter in the data. However, the magnitude of the individual errors is usually small. Now, if the measured value is observed for a very large number of times, the data exhibit @ continuous distribution. This could be easily represented by a normalised histogram, i.e., by representing the relative frequency for unit class interval as ordinate and the measured value as abscissa IF the magnitude of the class interval is kept small and the ordinates of the various class mid-points are joined by a smooth curve, the resulting distribution is called the limiting frequency distribution. This distribution has a characteristic bell shape (Fig. 22.2) and is commonly termed normal or Gaussian distribution The normal distribution is by far the most commonly occurring distribution. It serves as a model for a number of variates like experimental random errors, dimen- sions of mass produced articles, and the measurable Chapter ormal istribution biological characteristics such as man's weight, height, etc. Ths distribution can be represented mathematically : (Se where p(x) isthe probability density function which, for a given interval, represents the relative frequency of occurrence of the measured value in that interval @ the standard deviation of the measured values K the mean of the measured values The normal distribution model is employed in decision-making processes like the determination Of the probability that the measured value lies within 2 given range. Alternatively, if the level of probability or the confidence level is a certain fixed value (which may be a requirement for a certain situation) then it is possible to determine the allowable scatter or dispersion from a given mean value. pw) (22.1)562 Instrumentation, Measurement and Analysis Also, the properties of normal distribution are used for ments using the criteria known as z7-test (pronounced comparing various normally distributed samples using _as chi-square test). This criteria of *-test is also ap- statistical criteria known as significance tests. Further, plicable for determining whether any non-normal it is possible to determine the ‘goodness of fit’ of the distribution conforms to any other known theoretical measured values with those of the expected normally distribution or not. distributed values in the different ranges of measure- 0.3413 Point of inflection 0.1359 0.1359 “ War) K-20 X-6 X XtaX+20 Measurements —» Fig. 22.1 Typical Gaussian distribution 22.1 Hl PROPERTIES OF GAUSSIAN DISTRIBUTION Any typical Gaussian curve has the following features: Ithas a maxima at x = X, ie., at the mean value. Points of inflexion of the curve are atx = X +6. Itis symmetrical about the ordinate at x = X and divides the curve into two equal parts. Because of the symmetry of the curve, the median is equal to the mean. Further, since the mean value occurs at the peak probability density, it also represents the mode. Thus, for a normal distribution mean = mode = median, The x-axis is an asymptote of the curve. The area under the normal distribution curve is unity, ie. (22.2) For the same mean value, the distribution has a sharp peak for smaller values of o and is flatter for higher values of 6. If ¢ is small, that means that scatter in the data is small and consequently more values are concentrated near the mean value. Since the area of the normal distribution curve is unity, therefore the ordinate becomes higher making the peaky shaped curve. Hence, smaller the value of 6, the larger is the maxima of the curve. In fact, the equation of the maximum value of (22.3)‘Normal Distribution 563 0.798 4x) 0.390 0.266 Measurements —> Fig. 22.2 Effect of 6 on the shape of normal distribution curve 8. The probability that the mean value X takes the value between x, and x, is the area of the normal distribution curve between x, and x, as shown in Fig. 22.3. It is known as the integral Gaussian probability in the range between x, and x, and is represented as |Poole Pin) MK Measurements —> Fig. 22.3 Integral Gaussian probability of occurrence in a specified range of limiting values To determine the integral Gaussian probability in the specified range, we proceed as follows: POL x-Xy P(x? = exp (I ay I I le of 207 } yo ~(x-XP fF pat lee te vino564 Instrumentation, Measurement and Analysis = poo +1PCOIE = P(x;) + PQ) (22.4) where P(x,) and P(x) are the integral Gaussian probabilities between x, and X and X and x, respectively. The procedure of determining the values of integral Gaussian probabilities is explained in See, 22.4. It may be noted that |P(x)|"? would be the difference between P(x,) and P(x,) if both x, and x lie ‘on the same side of X. 22.2 HM AREA UNDER THE NORMAL DISTRIBUTION CURVE The area under a normal distribution curve between the limits -20 to is the integral Gaussian probability of occurrence of the measured value in the very large range between —2 to «. Obviously, the integral Gaussian probability would be unity as all the possible measured values have been taken into account, ive., |Pook, = To obtain this result, we integrate the Gaussian distribution equation (Eq. (22.1)] as follows: 4 I - 1 | ot Ne PIX) = _/ < dA= pix) dx ox . Measureronta —> Fig. 22.4 Typical Gaussian distribution showing elemental area The elemental area d4 of the normal distribution = p(x) dx. =(2- Fy -— vo PE ba (22.5) o Qn Integrating Eq. (22.5) between the limits —20 and «0 we get, [Al noriat = (22.6) ‘The expression on the RHS of Eq. (22.6) can be simplified by substituting z = (x — Xo and correspondingly dz = da/o‘Normal Distribution 565 (Alnormat = Te Joo Je (22.7) The term J exp (-22/2) dz is known as the normal error function and its value can be shown by numerical integration or otherwise to be 27 Substituting the value of the normal error function in Eq. (22.7) we get the result that the area under the normal distribution curve is unity. 22.3 DETERMINATION OF MEAN VALUE AND STANDARD DEVIATION OF THE CONTINUOUS DISTRIBUTION OF GAUSSIAN TYPE ‘The Gaussian distribution equation (Eq. (22.1)) involves two parameters namely, standard deviation @ and mean value X. Using the standard procedure of determining the mean value and the standard deviation of a continuous distribution, it is possible to show that these parameters come out to be the same. This would show that the use of the parameters and X in the Gaussian distribution equation is in conformity with their definitions. 22.3.1 Determination of Mean Value for a Gaussian Distribution ‘Mean value of the continuous variate is defined by X=2e-£0) (22.8) where f,(x) is the relative frequency of occurrence of the measured value x. ‘Now, for the Gaussian distribution, p(x) the probability density function is the relative frequency per unit interval for the value x. The relative frequency f,(x) of occurrence of x in the interval dx is, Fx) = p(x) dx (22.9) Substituting this value in Eq. (22.8) we get X = Ex - p(x) de (22.10) Replacing the summation sign by integral sign with limits ~2o to <0 and substituting the expression for Gaussian distribution in place of p(x) in Eq. (22.10), we get xe J x p(xar (22.11) This integral can be easily evaluated if we make the following substitution in Eq. (22.11): x 222% ng a= & ° o _e4 rn (n2 x= (zo +X) exp| = |ac (22.12) oV2n566 Instrumentation, Measurement and Analysis exp (27/2) dz to 2 aX = zexp(=27/2) dz+ Uk Lis 2 } —@ exp) -= |) +X 22.13) Stes] 7 ce) 0M ste mpe=3] =X 22.3.2 Determination of Standard Deviation for Gaussian Distribution ‘Variance of a continuous variate is defined by P= Se- XP) (22.14) For the normal destribution this expression becomes Variance Jonxe p(x) de (22.15) = Ju z¢ —liew rae 22.16) Es Vino 207 Substituting z= (x ¥)/o and dz = dv/o we get = ote 2 Variance ELS i oo = }« ‘Treating z* exp (-z*/2) as z - z exp ‘om and integrating by parts, we get J2 exp (-27/2) dz = -z exp (-27/2) + 1» exp (27/2) dz (22.18) Substituting the value of the integral in Eq. (22.17) we ct 2 - Variance = Fee beew eal + call Jexp(-2/2)¢2 =0+(0)-(1) Jexp(-27/2) de=1 [: aed =e 22.4 & STANDARDISED NORMAL DISTRIBUTION, In order to reduce the different normal distributions to a general form, it is common to standardise them by moving the origin of the coordinates of the mean value as well as to choose a scale on the x-axis in terms of 6. The suitable variable selected is z = (x — ¥)/o, termed the standardised normal variate. The probability of occurrence P(x) in the range x, to x, given by the normal distribution equation is a han (22.19) PO) (4 Sx S43} =‘Normal Distribution 567 This equation can be transformed in the form of P(x) in the range of z, and z) as follows: Since = (by definition) 2 ne (22.20) and dz= — ° Substituting these values in Eq. (22.19) we get, nl 2 2 <2 7 Plz) {z, $2 $2} ig a fe (22.21) This equation is termed the standardised normal equation. Comparing Eqs. (22.19) and (22.21), it is obvious that every time the probability of occurrence between x, and x, is required, the evaluation of integral in Eq. (22.19) is required, which is rather complex as compared to that in Eq. (22.21). This is due to the fact that in the latter case, the use of tabulated values of the normal error function, i.e. fexp (-z*/2) dz, simplifies the calculations considerably. Secondly, the shape of the curve in Eq. (22.19) depends on the values of X and o and will be different for different cases. Whereas the equivalent Eq. (22.21) shows that all normal distributions in x, with whatever values of X and o, reduce to the same shape of the standardised normal distribution in terms of the standard normal variate z with mean value zero and standard deviation unity. In other words, substituting X = 0 and o= 1 in Eq. (22.19) the equation reduces to the form of Eq. (22.21). It is because of this generality of Eq. (22.21) that the probability density function p(2) (ie. the ordinate of the standard normal curve) as well as the integral Gaussian probability P(2) = J p(z) dz (ie. the area of the standardised normal curve between the given limits) are tabulated versus z (Tables 22.1 and 22.2). It may be noted that while using the tables to evaluate the probabilities of normally distributed variates, it is advisable to sketch the area that corresponds to the probability required. Further, one should be careful in the use of integral Gaussian tables, as not all the tables give the same area. Some tables give the area from 0 to z, while others may give from —2 to z, from z to co or from —a to z, as shown in the Figs 22.5(a), (b), (c) and (d). The shaded areas represent the probability that the observation falls in the corresponding interval. However the tabulated values of the integral Gaussian probability presented in the text (Table 22.2) are as in Fig. 22.5(a). t ° a / [NS Sf -202 ° () %) © @ Fig. 22.5 Integral Gaussian probability in different ranges568 Instrumentation, Measurement and Analysis Table 22.1 Each entry in the table indicates normal probability den: Normal Probability Density Function p(z) pez) 1 \2n exp (24/2) function p(z) corresponding to +z Standardised normal distribution curve te 2) Plz) = Bele o To illustrate: the ordinate p(2) of the standardised normal distribution curve corresponding to = +1.0 is 0.2420 +2 0.00 on 0.04 0.06 0.08 0.0 03989 03989 03986 (0.3982 03977 ol 0.3970 03961 03951 0.3939 0.3925 02 03910 0.3894 0.3876 03857 0.3836 03 o3el4 0.3790 0.3765 03739 03712 oa 0.3683 03653 03621 0.3589 0.3555 Os 03485 03488 03410 03372 06 03292 03251 0.3209 0.3166 o7 03079 0.3034 0.2989 0.2943 os 0.2850 0.2803 0.2756 0.2709 09 0.2613 0.2565 02516 0.2468 10 02371 0.2323 0.2275 0.2227 1 02131 0.2083 0.2036 0.1989 12 0.1895 0.1899 0.1804 0.1758 13 omg 0.1669 0.1626 0.1582 0.1539 La 0.1497 0.1456 o.4is 0.374 0.1334 13 0.1295 0.1257 0.1219 O.1182 0.1145 16 0.1109 0.1074 0.1040 0.1006 0.0973 17 0.0940 0.0909 0.0878 0.0848 0.0818 Ls 0.0790 0.0761 0.0734 0.0707 0.0681 19 0.0656 0.0632 0.0608 0.0584 0.0562 2.0 0.0580 0.0519 0.0498 0.0478 0.0459 2 0.0440 0.0422 0.0404 0.0387 0.0371 22 0.0385 0.0339 0.0325 0.0310 0.0297 23 0.0283 0.0270 0.0258 0.0246 0.0235 24 0.0224 0.0213 0.0203 0.0194 0.0184 25 0.0175 0.0167 0.0158 0.0131 0.0183 26 0.0136 0.0129 0.0122 00116 0.0110 27 0.0104 0.0099 0.0093 0.0088 0.0084 28 0.0079 0.0075 0.0071 0.0067 0.0063 29 0.0060 0.0056 0.0053 0.0030 0.0087 30 0.0084 0.0082 0.0039 0.0037 0.0035‘Normal Distribution 569. Table 22.2 Integral Gaussian Probability P(z) Each entry in the table indicates the area under the standard normal curve from 0 to z lpczyg oS) Area given in the table To illustrate: area under the standard normal curve between the maximum ordinates and a point 1.96 standard deviations away is 0.4750 z 0.00 0.01 002.030 S89 0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 04279 O0319_0.0359 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 010636 0.0675 0.0714 0.0753 0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 0.3 0.1179 0.1217 0.1285 0.1293 0.1331. 0.1368 0.1406 0.1443 0.1480 0.1517 04 O1SS4 0.1591 0.1628 0.1664 _0.1700__0.1736—O.1772__—O.18OR_O.IS4S 0.1879 05 0191S 0.1950 0.1985 0.2019 0.2054 0.2088 0212302157 0.2190 0.2224 0.6 0.2258 0.2291 0.2324 0.2357 0.2389 0.2422: 02454 0.2486 0.2518 0.2549 0.7 0.2880 02612 0.2682 0.2673. 0.2704 02734-02764 ~—0.2794-—0.2823—0.2852 08 0.2881 0.2910 0.2939 0.2967 0.2996 0.3023 0.3081 0.3078 0.3106 0.3133 09 0.3189 0.3186 0.3212 0.3238 0.3264 0.3289 0331S 0.3940 0.3365 0.3389 10 0341303438 03461 0.3485 3508 0.3531 0.3554 0.3577 0.3599 0.3621 11 03643 03665 0.3686 03708-03729 0.3749 0.3770 «0.3790 03810 0.3830. 2 03849 0.3869 0.3888 (0.3907 03925 0.3944 0.3962 0.3980 0.3997 0.4015, 13 04032 04049 0.4066 04082-04099 O4I1SO4I31— 0414704162 0.4177 14 a4192 04207 0422204236 0425104265 0.4279 0.4292 0.4306 0.4319 13 04332 04345 0.4357 0.4370 04382 0.4394 0.4806 0.4429 0.4441 16 0.4452 0.4463 0.4474 Odd 0.4495 0.4505 04515 0.4535 0.4545 17 04554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 18 O4641 0.4649 0.4656 O64 OAGTI—O4GTS—O4GRS 0.4693 0.4699 0.4706 190471304719 04726 04732._—«OATSE OATHS O4TSO 0.4756 0.4761_—_—0.4767 20 04772 OAT78 OAT8S _OATRR OATO3 OAT _O4803__ 04808 OABID__OARIT 21 0482104826 04830-0.4834 04838 0.4842 0.4846 0.4850 OARS 0.4857 2 OAR6L 04864 OSHS 04871 —O4RTS—«OANTS.—O4SSI_—«O48RA 04887 0.4890 23 04893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911. 0.4913 0.4916 24 04918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934——0.4936 25 04938 0.4940 0.494) 0494304945 0.4946 0.494 0.4949 O49 0.4952 26 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4968 27 04965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0497304974 28 04974 0497504976 «04977-04977 0.4978 0.4979 0.4979 0498004981 29 04981 0.4982 04982 0498304984 0.494 AIK 49S 49S 0.4986 3.0 04987 04987 04987 O49RE —OADER 0.4989 0.4989 0.4989 0.4990 0.4990 3.1 04990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992. 0.4992 0.4903 0.4993, 32 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995, 33 04995 0.4995 0.4995 0.4996 0.4996 0.4995 0.4996 0.4996. 0.4996. 0.4997 34 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998 0.4998 35 04998 04998 0.4998 0.4998 O49OR 0.4998 0.4998 0.4998 O.4NDR —_0.499R 36 04998 0.4998 0.4999 0.4999 0.4999 0.4999 9.4999 0.4999 0.4999 0.4999 37 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 9.4999 0.4999 0.4999 0.4999 38 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.5000 0.5000 0.5000 3.9 0.5000 0,500 05000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000570 Instrumentation, Measurement and Analysis 22.5 CONFIDENCE LEVEL As mentioned earlier, the area under the normal distribution curve between ~2o and so is unity. This should indeed be so because the probability of all possible measured values lying between —20 and has to be unity. In actual practice, we generally specify a certain range of permissible values of scatter or dispersion from the mean value and determine the probability that the measured value lies in that range. This probability can be evaluated by finding the area under the normal distribution curve in the specified range. When this probability is expressed as a percentage, it is termed as confidence level. For example, we wish to predict how much is the probability of occurrence of the measured value xin the range ( X + ) to (X ~ 0). Obviously, it is the area under the normal distribution curve in the specified range of (X + 0) to (X — 0). Using Table 22.2 we find that the integral Gaussian probability in the range of ( X + 0) (alternatively between z = +1) is 0.6826. Therefore, we can say that chances are better than 2 : 1 (actually 68.26 : 31.74) that the measured value lies between X + o. Alternatively we may say that the confidence level in such a case is 68.26%. Using the integral Gaussian table, the confidence levels for other cases can also be determined. Some typical values of confidence levels are shown in Fig. 22.6, 4 68.26%! le 95.5% | x 99.74% >| Fig. 22.6 Confidence levels for different ranges of measured values In any experiment, we generally try to set the confidence limits within which we expect the measured value to lie within a given probability [Figs. 22.7(a), 22.7(b) and 22.7(c)]. To do this, we must decide the probability of error that we are willing to accept. The percentage probability of error is defined as 100 minus the confidence level. In general, we accept errors up to 5%, i.e. 95% confidence level for which z= 41.96 (from Table 22.2) for two-sided confidence and z = 1.645 for one-sided confidence. However, where human life is involved, we insist on low probability of error, of the order of 1%. This gives a confidence level of 99% for which z = 42.58 for two-sided confidence and 2.326 for one-sided confidence.‘Normal Distribution 571 In practice, we come across two types of problems. Some are direct in which the limits of x are given and the probability of occurrence in this range is required to be determined. Whereas the other problems are of inverse type in which the probability of occurrence, i.e. confidence level is usually taken as 95% and the limits of x are required to be determined, Problem 22,1 Analysis of the data of the machine components having completed their anticipated useful lives due to wear and tear usually follow the normal distribution pattern. ifthe components ofa particular type, in cone of the samples have a mean wear out life X of 1000 h with a standard deviation o of 25 h, determine the proportion of components that would have wear out life in hours (a) greater than 1050 and (b) between 950 and 1025. Solution In such problems, we basically determine the area under the normal distribution curve pertain- ing to the particular region indicated by the problem. Therefore, itis usually preferable to represent the data pictorially by sketching a normal distribution curve. (a) Greater than x = 1050 h Ina given sample, the mean wear out life, ¥ = 1000 h and standard deviation o = 25 h. Error Kenttdence)> Enor level YUM Lill x Range of Range ol = [acceptable [acceptable values values of, of Lower Upper Loner confidence confidence confidence iit Tint Timi (0) Two-sided confidence (b) One-sided confidence ‘Confidence Yi. Ldddldibdidd =__Range of acceptable_,| Upper values of x; ‘confidence limit (c) One-sided confidence with only upper confidence limit Fig. 22.7 Pictorial representation of Gaussian distribution with given confidence limits Standard normal variate 2, at the cut-off value of x, equal to 1050 h x-X o 1050 ~ 1000 25572 Instrumentation, Measurement and Analysis It is now clear from Fig. 22.8(a) that the proportion of components having a life greaterthan 1050 h corresponds to the area under the normal distribution curve from 2; = 2.0 to 2, =. This area is equal to the area of the normal distribution curve between z = 0 and z, = 2 minus the area between z = 0 and 2, = 2.0, In other words, the required integral Gaussian probability is: Pk, = Pa -|Paj 0.50 — 0.4772 (From Table 22.2) 0.0228 Hence, we can say that 2.28% of the machine components have life greater than 1050 h. (b) Between 950 and 1025 h Here, x3 =950h and x,=1025h 950-1000 *. Corresponding == —S5— 1025 - 1000 and 4-3 The shaded area in Fig. 22.8(b) represents the proportion of components having a life between 950 and 1025 h. | IN Zh _/* Fa 1000h "ye 1050h— x F=1000h = 1050 —> x z=0 4220 +z 280 +z 220 42200 ez Fig, 22.8(a) Figure for Problem 22.1 P = 950h X= ye x 1000h 1025h 22-20 9 220 %210 ——>Z Fig. 22.8(b) Figure for Problem 22.1‘Normal Distribution 373 Therefore, the required integral Gaussian probability is: er) yt? PE, = PAL,» +/PZ = 0.4772 + 0.3413 (From Table 22.2) = 0.8185 Hence, the proportion of components having life between 950 and 1025 h is 81.85%. Problem 21.2 A study has indicated that the life of TV picture tubes manufactured by a certain firm is normally dis tributed with a mean life of 5 years (x year = 365 days) and a standard deviation of s00 days. The ‘manufacturer gives a guarantee of 1 year. Determine (a) what percentage of picture tubes will he have to replace in 2 year? (0) if the manufacturer wishes to replace the same amount of picture tubes with 2 years guarantee, what should he do? Solution (a) The mean life of the TV picture tube, X = 5 x 365 = 1825 days Standards deviation, 500 days Guarantee period, 365 days The standard normal variate for the cut-off value of 365 days, 2; 365 - 1825 500 The given data can be represented pictorially on the normal distribution curve as shown in Fig. 22.9, a Pp x= 965 days X= 1825 days ——>x 2,572.92 z0-—z Fig, 22.9 Figure for Problem 22.2 ‘The proportion of TV picture tubes the manufacturer has to replace is indicated by the shaded area on the normal distribution diagram, Therefore, the required integral Gaussian probability between z = -2.92 to z = -s is determined from ‘Table 22.2 as follows: 292 par c = |P(Z| -|P(2) = 0.5 ~ 0.4982 = 1.8 x 10° Hence, the percentage of picture tubes to be replaced with 1 year guarantee is 0.18.574 Instrumentation, Measurement and Analysis (b) If the manufacturer wants to replace the same percentage of picture tubes, then he can do one of the following: (i) Improve the quality, i.e. X = 2190 days = 6 years (ii) Increase the precision with which he manufacturers the picture tubes, i.e. 730 182: 730-1825 _ 5 9 o o= 375 days (iii) Any combination of the above two factors satisfying the relation: X-730 = =-2.92 Problem 22.3 Anassembly of shaft and hub is used in many engineering situations. They are selected at random from a large supply with manufacturer's specifications of mean shaft diameter of 33.0 mm and standard deviation 0, of 0.04 mm, and with mean hub diameter of 32.1. mm and standard deviation 6, of 0.03 mm. Determine the ‘number of times we shall have a satisfactory fit out of 250 cases, ifa satisfactory fit is defined as one having a diameteral clearance of at least 0.03 mm and at the most 0.28 mm. Solution Mean diameteral clearance € ‘mean hub diameter) — (mean shaft diameter) 31.1 -31=0.1 mm Overall standard deviation in diameteral clearance (6) = 4) Oar iater* Fra ame = 0.047 +0.03* = The mean, maximum and minimum clearance are shown in normal distribution curve in Fig. 22.10. Now, the proportion of satisfactory fits is given by the area of the normalised Gaussian curve between the minimum and maximum clearance. .05 mm, Emax ~E ole) 018-01 _ 0s NS Sain — ole) 0.03- 0.1 oos From Table 22.2 of the integral Gaussian probability we get, |P@h® = 0.4452 Zavax and Zain and Piz, = 04192 ‘ % of satisfactory clearance = 100 {0.4452 + 0.4192} =‘Normal Distribution 375 Fyn = 0.08mm — KA OA gg = 018 mm Zan td 220 lag 1.6 Clearance ¢ —> Normalised clearance 2 —> Fig. 22.10 Figure for Problem 223 Hence, the number of times we have a satisfactory fit out of 250 cases 86.44 190 * 259 = 216.1 = 216 Problem 22,4 Structural engineers sometimes apply a factor of safety tothe statistical minimum failing tress. They sual employ the criteria that 95%6 of the tests should exceed the minimum failing stress. For acertain specimen of timber, 200 tests for strength properties were carried out which were found to be normally distributed with a ‘mean failing stress of 30.25 Nmm? and standard deviation of 3.95 N/mm?. Determine the value of the minimum failing stress. Solution The strength properties of the timber can be shown on the normal distribution curve as in Fig. 22.11 (Minimum failing _ stress) X = 30.25 Nimm? —> x Znn = 1.645 2 Fig. 22.11 Figure for Problem 22.4576 Instrumentation, Measurement and Analysis In this case one-sided confidence level is 95% which z = -1.645 (Table 22.2). Xpin ~ & Ann = 1,645 o on =3025 “r 3.95 4 which gives xin, the minimum failing stress as 23.75 N/mm? 22.6 & CENTRAL LIMIT THEOREM ‘The central limit theorem is an important statistical theorem. It states that ‘the sample means of a population do follow Gaussian distribution, whatever the distribution of the individual measurements’. Therefore, the standard normal variate z for the sample means population can be defined as: X-¥ ce (22.22) where is the mean value of any sample X is the population mean and is the standard deviation of means which is very nearly equal to the internal estimate of uncertainty (or internal standard error) U,, when the number of observations is large It may be noted that the normal distribution of measure- ‘ments of a sample has lesser precision as compared to the corresponding normal distribution of the sample means of the population. In other words, the latter distribution has a sharper peak than the former (Fig. 22.12). This is because of the fact that 4 p n= Fe (22.23) In most casts, we are interested in the analysis of internal standard error of the data and employ the standard normal variate z with respect to the population mean and Mean value X= X —> the standard deviation of the means. However, in some Measurements problems, we are interested in the standard deviation of the Fig. 22.12 A typical distribution showing the sample data. In such cases, the value of z is calculated us- measurements of the sample means ing the sample mean and the sample standard deviation of « population Problem 22.5 Ina manufacturing process, the time required to complete a certain electronic component was to be studied. The time needed had a mean of 75 min. and a standard deviation of 20 min forthe case of 25 randomly selected components in a sample. Determine (a) population mean, (b) standard deviation of the means, and (c) the size of the sample f the internal standard error (or internal estimate of uncertainty) isnot to exceed 1 min. Solution (a) The population mean in nearly equal to the sample mean: X = 75 min,‘Normal Distribution 377 (b) Standard deviation of mean (©) Internal standard error If then which gives 1, the size of sample = 101. Problem 22.6 A firm manufactures ball bearings for a certain application. Several samples of size n were taken at random for a population of a large number of balls with mean X and standard deviation 0. What is the range within which the sample mean X can lie with 959 confidence level? Solution Since the samples have been taken from a large population, their means follow the normal distribution (as per the central limit theorem) with mean X = X and standard deviation o,, = o/ Jn . For ‘two-sided confidence level of 95%, the value of standard normal variate is Z= 41.96 or 95% Confidence level LecvccevesehescsvecsstA alae) +196, 2 xe (min = 196 ZL (Rona = 41.96 Range of (X;) for ea conntenco level Fig. 22.13 Figure for Problem 22.6578 Instrumentation, Measurement and Analysis which gives (R) max = X — 1.96 va and (Kinin = X - 1.96 g Hence, the length of abscissa shown in Fig. 22.13 between ( X )max and ( X )pin gives the required range within which the sample mean X would be with 95% confidence level. 22.7 SIGNIFICANCE TEST If the normal distribution of a particular quantity is known and the quantity is measured again under somewhat changed conditions, the mean value is unlikely to be the mean of the original distribution. If the difference in the means is small, it would be reasonable to assume that the distribution is from the same population. On the other hand, if the difference is considerable, then it would be reasonable to assume that the changed circumstances have altered the values and the result is significant. In other words, wwe can say that the original data and the subsequent data taken under somewhat changed conditions are not from the same population. To test whether there is a significant change in the original and subsequent data, we use a significance test based on the difference of means. The statistician’s criterion of the significance test is that if the difference in the mean values of the samples deviates 1.96 times the internal standard error of the difference of means, then the change is significant Let us say that a particular sample has m, quantities which gives the mean value as X , with a standard deviation of G,. Similarly, another sample under somewhat changed conditions has n, quantities which give the mean value as X, with a standard deviation 05. It should be recalled here that the internal standard error (or the internal estimate of uncertainty) of a sample of 1 quantities with standard deviation o is given by (22.24) (n-1) Keeping this in mind, we proceed as follows to determine whether the change is significant or not. 1. Find the difference in mean values which gives the range of variation of the mean values, i.e., R(X) =X, -X, (22.25) 2. Determine the internal standard error of R(X), ie, aR(X) R(X (}az)= |S | 3 2 m [FE a = U2 +03 (22.26) Substituting the values of U, and Up in terms of the corresponding standard deviations of the samples we get, (acy) = YUE + U3‘Normal Distribution 579 ye (22.27) 3. Knowing the values of R(X) and (U)g, x) from Eqs. (22.25) and (22.27), respectively, we now employ the significance test criterion which is as follows: 1 RX) > |1.96Wae g 22.28) then the difference of means is significant, otherwise, we can assume that the original and subsequent distributions are from the same population. In other words, we can say that up to 95% confidence level, the means of the two samples are not from the same population if the significant test given by Eq. (22.28) is positive. Conversely, if the difference in means R( X ) lies within 1.96 times the internal standard error in R( X), then the result is insignificant and we can say that probability of error is less than 5% in considering both the samples to be from the same population. Further, if the difference of means deviates more than 2.58 times the combined internal standard error in R(X ), then the result is highly significant and in this case, the confidence level considered is 99%. Problem 22.7 Ten specimens of mild steel are chemically analysed for carbon content in two different laboratories. The percentage of carbon content contained are as follows: Laboratory A 0.33 0.33 022 0.35 0.28 0.24 0.20 oa8 0.33 0.24 Laboratory 8 0.22 0.9 0.24 020 0.22 0.28 0.20 022 0.25 ox Test the hypothesis that there is no significant difference in the two laboratories in their determination of percent- cage of carbon, Solution From the given data we find that: ‘The mean value of percentage of carbon found in Laboratory A, X , = 0.229 and the standard deviation in the values in Laboratory A, a4 = 0.02625. Similarly in Laboratory B, X = 0.208 and the standard deviation in the percentage of carbon iy = 0.024 The internal estimate of uncertainty u, - 2:22825_ — 9.00875 Yao-1. 0.024 y(l0-1) Combined internal estimate of uncertainty of difference in means, Upc gy = ¥ (0.00875? + 0.0087) 0.011857 Xa~Xo = 0.229 — 0.204 = 0.021 and the internal estimate of uncertainty, Up .008 ‘The range of mean values RR)580 Instrumentation, Measurement and Analysis Assuming 95% confidence level for the significance test, we get, the allowable range of combined internal standard error in R(X) = 1.96 Ugg) = 1,96 x 0.011857 = 0.0232 Nov, it is obvious that the difference in means R(X ) lies within 1.96 times the combined internal estimate of uncertainty Up, », Hence, we can say that there is no significant difference in the determination of percentage of carbon in the specimens of mild steel, carried out in the two laboratories. 22.8 MH CHI-SQUARE TEST FOR GOODNESS OF FIT When a set of measurements is obtained, it is believed that the measurements are a sample of some known theoretical distribution; say normal frequency distribution which is generally hypothesised in cases involving experimental statistics. For comparing the different parts of the observed distribution, we subdivide the data into a number of classes say m and determine the observed frequency in each class. ‘Then we estimate the expected frequency of each class by assuming that the distribution conforms to a particular theoretical distribution, For example, if the assumed distribution considered is Gaussian, then the following procedure may be adopted to calculate the expected values of frequencies for a given set of data 1. Calculate the mean value and standard deviation of the data. 2. For each class interval, calculate the standard normal variates z, and z, for the upper and lower boundary values, respectively. 3. From the integral Gaussian table, determine the integral Gaussian probabi O and 2). 4, The difference in the above values gives the integral Gaussian probability in the given interval if both the upper and lower boundaries lie either between 0 and co or 0 and co. The sum of these values gives the integral Gaussian probability if the upper boundary lies between 0 and co and the lower boundary lies between 0 and ~c and vice versa. 5. Multiplying the integral Gaussian probability in a given class interval by the total number of observations gives the expected frequency of occurrence of the variable in that interval. 6. The summation of the expected frequencies in all classes sometimes does not equal the total number of observations. The slight difference is caused by small rounding-off errors due to interpolations in the integral Gaussian table. Therefore, the expected frequencies in step (5) are multiplied by a suitable correction factor so as to make the sum of expected frequencies equal to the number of observations. After determining the expected frequencies in the various classes, we determine the 7? (pronounced chi-square)-parameter as follows: Let us say that there are 1 classes (n> 1) and the expected and observed frequencies in the various classes are denoted by ty between 0 and z, and Tos Secs Sey. Se, and Sa, fog, Soy. Se, Now, our aim is to determine whether the observed frequencies and the expected frequencies are close enough for us to conclude that they come from the same probability distribution. To do so, we define the 7C--parameter as: Bomar = Ly (22.29) 8 | Sar - fas Ses‘Normal Distribution 581 where nis the number of values that are summed up to produce the values of x? ‘mis the number of constants used in the calculation of expected frequencies n—mis the degrees of freedom and subscript df. stands for the degrees of freedom The values of the numerator in the x? (chi-square) expression represent the squares of deviations between the expected and observed frequencies in various classes which is always positive. These values are normalised in each class by dividing them by the respective expected frequency of each class. It may be noted that the same order of deviation in the expected and observed frequencies causes relatively larger contribution in the 2?-parameter at the tail portions of the normally distributed data, as compared to the values close to the mean value of the data. This is because of relatively large values of the expected frequencies near the mean value of the data which happens to be in the denominator of the x?-parameter. In order to restrict the unusually large contributions in z?-parameter when the expected frequencies are small, the empirical criterion commonly used in practice is to regroup the various classes in such a way so that the expected frequency in each class is not less than 5.0. Further, a correction is sometimes applied to the chi-square values when the degree of freedom F, i.e. (nm) is of the order of 1.0. This is termed as Yates correction and accounts for the inaccuracies involved when the results of continuous distributions are applied to discrete data. The correction consists of writing Eq. (22.29) in the following form: g.| (for = fer) = 0.5? [Mag] If the sample distribution agrees with the assumed theoretical distribution then 7? = 0. This is of course very unlikely because even if the sample is taken from the parent distribution, one would not expect exact agreement in every interval. But, larger the value of x, the more is the disagreement between the assumed distribution and the observed values. In other words, in such a case, the smaller is the probability that the observed distribution matches the expected distribution. Thus, the chi-square parameter is quite useful in statistical analysis of data as it helps to test a particular hypothesis in the given data. In applying the chi-square test, we first determine the value of z? for the given data. Then, we determine the values of degrees of freedom F which is equal to (m — m). Knowing the values of x? and F, we determine the probability that the actual measurements match the expected distribution from either the chi-square tables (Table 22.3) or from the z-F diagram (Fig. 22.14) which gives cross-plots of chi- square probability, P(), for various values of 7? and F. 22.9 MH CRITERIA FOR GOODNESS OF FIT The statistical criteria for the goodness of fit, ie., how well as set of observed data fit the assumed theoretical distributions are as follows 1. If the value of probability in the z?-test lies between 0.1 and 0.9, then the observed distribution is considered to follow the assumed distribution. In other words, there is no reason to suspect the hypothesis. In certain cases, the lower limit of chi-square probability (also termed significance level or simply evel) may be reduced to 0.05. 2. If the value of the probability in the x?-test is below the lower prescribed limit, then the result is significant and the sample data is considered to be entirely different from the assumed distribution. In such cases, the value of the x?-parameter is usually quite large. 3. If the value of the 77-parameter is nearly zero or very small, then the probability may exceed the upper limit of 0.9. Such cases are hardly encountered in practice. If it is so, then we normally consider the data to be suspiciously good. 7 (22.30)582 Instrumentation, Measurement and Analysis Table 22.3 7°-F Table indicating the Probability P(y?) This table gives the values of 7? which have various probabilities of being exceeded by a sample taken from the given parent distribution. The number of degrees of freedom is F. to illustrate: for a sample with 6 degrees of freedom, the probability, P(z?) is 0.95 if z? = 1.635 and 0.1 if x? = 10.645. FPG?) 099 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.010.001 0.000157 0.00393 0.0158 0.0642 0.148 0.455 1.074 1.642 2.706 3.841 6.635 10.827 1.239 2.167 2.833 3.822 4.671 6.346 8.383 9.803 12.017 14.067 18.475 24.322 1646 2.733 3.490 4.594 5.527 7.344 9.524 11.030 13,362 15.507 20.090 26.125 9 2088 «3.325 4.168 5.380 6.393 8.343 10.656 12.242 14.684 16.919 21.666 27.877 10 2.558 3.940 4.865 6.179 7.267 9.342 11.781 13.442 15.987 18.307 23.209 29.588 1 2 0.0201 0.103. 0.211 0.446 0.713 1.386 2408 3.219 4.605 5.991 9.210 13.815 3 O15 0.352 0.584 1.005 1.424 2.366 3.665 4.642 6.251 7.815 11341 16.268 4 0,297 0.711 1.064 1.649 2.195 3.357 4.878 5.989 7.779 9.488 13.277 18.465 S 0554 1.145 1,610 2.343 3,000 4.351 6,064 7.289 9.236 11.070 15.086 20.517 6 0872 (1.635 2.204 3.070 3.828 5.348 7.231 8558 10.645 12.592 16812 22.457 7 8 I 3.053 4.875. 5.578 6.989 8.148 10.341 12.899 14.631 17.275 19.675 24.725 31.264 12° 3571 5.226 6.304 7.807 9.034 11.340 14.011 15.812 18.549 21.026 26.217 32.909 13 4.107 5.892 7.042 8.634 9.926 12.340 15.119 16.985 19.812 22.362 27.688 34.528 14 4,660 6.571 7.790 9.467 10.821 13.339 16.222 18.151 21.064 23.685 29.141 36.123 IS 5,229 7.261 8.547 10.307 11.721 14.339 17.322 19.311 23.307 24.996 30.578 37.697 16 5.812 7.962 9.312 11.152 12.624 15.338 18418 20.465 23.542 26.296 32.000 39.252 17 6408 8.672 10.085 12.002 13.531 16.338 19.511 21.615 24.769 27.587 33.409 40.790 18 7.015 9.390 10.865 12.857 14.440 17.338 20.601 22.760 25.989 28.869 34.805 42.312 19 7,633 10.117 11,651 13.716 15.352 18.338 21.689 23.900 27.204 30.144 36.191 43.820 20 8.260 10.851 12.443. 14.578 16.266 19.337 22.775 25.038 28.412 31.410 37.566 45.315 21 8,897 11.601 13.240 15.445 17.182 20.337 23.858 26.171 29.615 32.671 38.932 46.797 22 9,542 12.338 14.041 16.314 18.101 21.337 24.939 27.301 30.813 33.924 40.289 48.268 23° 10,196 13.091 14.848 17.187 19.021 22.337 26.018 28.429 32.007 35.172 41.638 49.728 24 10.856 13.848 15.659 18,062 19.943 23.337 27.096 29.553 33.196 36.415 42.980 51.179 25 11.524 14.611 16.473 18.940 20,867 24.337 28.172 30.675 34.382 37.652 44.314 52.620 26 12,198 15.379 17.292. 19.820 21.792 25.336 29.246 31.795 35,563 38.885 45.642 $4,052 27° 12.879 16,151 18.114 20.703 22.719 26.336 30.319 32.912 36.741 40.113 46.963 55.476 28 13,565 16,928 18.939 21.588 23.647 27.336 31.391 34.027 37.916 41.337 48.278 56.893 29 14,256 17.708 19.768 22.475 24.577 28.336 32.461 35.139 39.087 42.557 49.588 58.302 30 14,953 18.493 20,599 23.364 25.508 29.336 33.530 36.250 40.256 43.773 50.892 59.703‘Normal Distribution 583 Value of 2? o 5 10 18 2 2 30 Number of degrees of freedom F —> Fig. 22.14 -F diagram indicating various values of x2 probability Problem 22.8 The coefficient of friction between glass sheet and wood was measured in the laboratory by a technique thats free from systemic errors, The data obtained was as follows: Coefficient of friction values Observed frequency f,, 0.44-0.46 3 0.46-0.48 10 0.48-0.50 12 0.50-0,52 16 0.52-0.54 10 0.54-0.56 0.56-0.58 3 Determine if the values of the coefficient of friction follow the Gaussian distribution or not. Test 2¢-values up to 20% level. Solution We first determine the mean value and the standard deviation of the given data using the method of assumed mean or otherwise. ‘The mean value X of the coefficient of friction 4 of the given data is 0.5067 and the standard deviation oF in the given data is 0.03062. Using the integral Gaussian tables, we determine the integral Gaussian probabilities between 0 and 2, and 0 and z, for different classes. From these values the integral Gaussian probability of each class (Az) is determined, It may be noted that the summation of the integral Gaussian probabilities of all the584 Instrumentation, Measurement and Analysis classes is found to be generally very slightly less than unity. Therefore, the expected frequencies in the various classes are calculated by multiplying P(Az) by the total number of observations and a correction factor which is the reciprocal of EP(A2) SI. No. Classes fy 2; % Pe) Ple,) Plas) fot 1 0446-046 3 2178-1525 0.4853 0.4364 0.0489 2.99 2 046-048 10 -1.525 0.872 0.4364 0.3084 0.1280 7.83. 3 0.48-0.50 12 “0872 0.219 0.3084 0.0867, 0.2217 13.57 4 0.50-0.52 16 0.219 0.434 0.0867-0.1678 0.2545 15.57 5 0.52-0.54 10 0.434 1.088 0.1678 0.3617——0.1939 11.87 6 054-056 6 1088 1.741 0.3617 0.4592 0.0975 5.97 7 056-058 3 L741 2.394 0.4592 0.4952 0.0360 Total oO 0.9805 In the above table, the expected frequencies of the first and last classes are less than 5. Therefore, these are combined with the adjacent classes to make them more than 5 and then the various ratios (as —fad"Ufe are calculated as follows: SI. No. ai fut Soi Sei Cor So) ei 1 1B 10.82 2.18 0.439 2 2 13.57 137 0.182 3 16 15.57 0.43 0.012 4 10 11.87 -187 0.295 5 9 8.17 0.83 0.084 Total 1.012 ES y= 1.012 ‘The number of degrees of freedom F are given by F = n— m In this problem the number of terms which are summed to give z°, i. n = 5. Further, the number of constraints m is equal to the number of quantities obtained from the observations which are used in the calculation of expected frequencies. In the present problem m = 3 because there are three quantities namely, the total number of observations, the mean value and standard deviation of the data which have been used in the calculation of expected frequencies. F=5-3 2 For 2 degrees of freedom, the value of 7? at 10% level of chi-square probability is 4.605 (from Table 22.3). Now, the value of z? obtained from the given data, i.e. 1.012 is far from being large enough to justify the rejection of the normal distribution model. Alternatively, we can also state that there is no reason to suspect that the data of the coefficient of friction values obtained in the experiment follow the assumed normal distribution. This is because the value of P(z) corresponding to 7? = 1.012 and F = 2 is 0.62 (from Table 22.3) which lies between required range of 0.1 to 0.9.‘Normal Distribution 585 22.10 CONTINGENCY TABLES ‘There are certain experiments in which the observed frequencies occupy a single row or column, Such data is obtained due to the variation of a single variable and is therefore arranged in the form of one- way classification table. If the data has m number of columns, then this type of data has the form of 1 x ‘n (pronounced 1 by 1) table. Sometimes each data point of two or more similar experiments is sampled and classified with respect to certain stipulated variations of the conditions. Alternatively, the data points of a single experiment ‘may be classified with respect to multiple variations of conditions. This results in a multi-way classification table or m x n table in which observed frequencies occupy m rows and n columns. Such tables are generally called contingency tables. In the solution of contingency table problems, we generally put forward a hypothesis (called the mull hypothesis) that helps us to determine the expected frequencies corresponding to the various observed frequencies according to the rules of probability. After this, we investigate the goodness of fit between the expected and observed frequencies as per the null hypothesis using the z?-test. We compute the y7- parameter and determine the degree of freedom in the given data. If the parameter is greater than (2 )o10 ver then the result is considered significant, i.e. the null hypothesis (N.H.) is rejected in favour of an alternative hypothesis) (A.H.). It may be noted that the number of degrees of freedom F in an m x n contingency table when m>1andn> 1 is given by 1 F=(m-1) (1-1) 22.31) if the expected frequencies can be computed without determining the population parameters such as mean value, standard deviation, ete. 2. F= {(m-1)(n-1)- 8} (22.32) if the expected frequencies calculations involve the use of & number of population parameters. Problem 22.9 A test was conducted to investigate the effect of a new vaccine against a particular infectious disease ‘an laboratory animals and the following data was abtained: Got disease Did not get disease Total Vaccinated a 40 48 Not vaccinated 45 7 2 Total 3 67 39 (a) Using significance_levelsof 0.05, test whether thenew vaccine s effective incombating the particular infectious disease under investigation. (6) Repeat the problem using Yate's correction. Solution It is possible to determine the expected frequencies if we assume a null hypothesis that there is no difference between the vaccinated and unvaccinated groups, i.e. the vaccination and this disease are independent. Now, it is clear that in a sample of 90, the proportion which got the disease is 23/90. Taking this as an estimate of overall proportion which got the disease, 23/90 x 48, i.e. 12.27 is the expected number of laboratory animals out of total the sample of 48 which were vaccinated, By similar reasoning, the expected number of laboratory animals that were vaccinated and did not get the disease would be 67/90 x 48, ie. 35.73. Continuing this way, the other expected frequencies have been calculated which are shown in the following table in parentheses below the observed frequencies.586 Instrumentation, Measurement and Analysis Gor disease Did no get disease Toral Vaccinated 8 0 8 xas=i2a7) — (Zxas=35.3) 90 ) Not vaccinated 15 2” 2 xaeion) — (S2xa2=3127) Total B a 90 Knowing the observed and expected frequencies, the z*-parameter without and with Yate’s correction can be evaluated as follows: ene fa Fe Cuffs NM Surfed — 9.5 es : 8 12.27 1486 Tse > ny 38.73 0.510 0.398 = 15 10.73 1.699, 1.325 ‘ n 31.27 0.583 oss Toul 4.278 3536 = 4278 (2 )witn Yates conection = 3:366 We observe there are three restrictions in the data: (i) the number vaccinated, (ii) the number not vaccinated, and (iii) the additional restriction involved in the calculation of expected frequencies is those that contracted the disease and those that did not. The number of degrees of freedom is thus: Fr4-3=1 Alternatively, the number of degrees of freedom could also be obtained by using Eq. (22.31) F=(m-1) (2-1) =@-ND@-)=1 From chi-square tables (Table 22.3), the critical value of 7? at 0.05 significance level corresponding to one degree of freedom is 3.841. This is less than 7?-value obtained in the experiment. Therefore, the result is significant and we reject the null hypothesis (N.H.) in favour of the alternative hypothesis (A.H)). Hence we can say that up to 0.05 level, there is a significant difference between the vaccinated and not vaccinated groups, i.e., the new vaccine is effective in combating the particular disease under investigation. Since the degrees of freedom in the given problem is 1, it is desirable to incorporate Yate’s correction in the z?-value. The x?-value with Yate’s correction is 3.336. It is less than (Z7)o 95 ever i.e. 3.841. Therefore, the result is insignificant. Hence, the test favours the null hypothesis (N.H.) and we can say up to 5% significance level that the vaccination and this particular infectious disease are independent.‘Normal Distribution 587 Revie Questions 22.1 Fill (i) The probability that a measured value would lie between X + 1.645¢ and X ~ 1.645o is (ii) The tails of the normal distribution curves are ________to the abscissa (base line), (iii) A confidence interval is a range within which the population ______ would most likely fall (iv) (1-confidence level) represents the ________in any typical experiment. (v) The shape of normal distribution is _______ (choose any one of the shapes mentioned, i.e. parabolic, rectangular hyperbolic, half sine curve, bell-shaped curve). (vi) Area under the normal distribution curve is (vii) “No matter what the shape of original distribution, a notable event occurs with the distribution of means of sample values. They approach a normal distribution.” The previous sentences are the statement of ____ theorem. (viii) The points of inflection of the normal curve are at x)= X + b where b = (ix) Ifa student received a score of 90 in a test with standard error of 3, he could feel 90% confident that his true score lay between __ and (x) Null hypothesis states that no population __ exists. (xi) The data is suspiciously good if the 7? probability (xii) In a normally distributed data, the constraints which affect the degrees of freedom are: (a) (b) __, and () ___. (xiii) The value of 7? for 9: 3: 3: 1 hypothesis for 66, 22, 30 10 is (xiv) In a contingency table with m rows and columns, the degrees of freedom, when population parameters are not used is given by —__ (xv)_A test is highly significant if the range of means of the sample is 22.2 Indicate whether the following statements are true or false. If false rewrite the correct statement. (i) The maximum height of the standard form of normal distribution is always /o J2x (ii) If the results from experiments repeated for many times follow the normal frequency distribution, it does not confirm that the error is only due to randomness. (iii) The sum of the relative frequencies of all classes is always 1.0. (iv) In a mass produced sample of balls, usually, less than 50% of the balls have diameter in the limits of 1 standard deviation on either side of the mean diameter. (v) The area of normal distribution curve between two points represents the confidence level/ relative frequency of occurrence/probability of occurrence of any event in that region. (vi) The normal distribution of sample means is flatter as compared to the normally di sample curves (vii) ‘The distribution of sample means of skewed (non-normal) distributions is always skewed. (viii) Normalised histogram represents the normal distribution of experimental data, (ix) The results of two experiments will be significantly different if the combined intemal standard error of the difference of means at 95% confidence level is more than the range of means. (x) Lower values of x of the data than the critical value at a given significance level ensure rejection of the null hypothesis in favour of the alternative hypothesis. uted588 Instrumentation, Measurement and Analysis 22.3 Ina certain manufacturing process, the diameter of shafts produced has a mean diameter of 20 cm and a standard deviation of 0.5 mm. If the shaft diameters range from 19.9 to 20.1 cm are acceptable, how many rejects would you expect in a random lot of 100 shafts 22.4 A mass produced electronic gadget had X = 163 h. If the probability of survival of the gadget is at least 90% for at least 130 h, determine the standard deviation of the test sample, 22.5 An accelerated test programme for determining the time of failure of an electronic component obeys a normal probability distribution with X = 2h and = 2h, Out of a sample of 600 test components, how many are likely to survive for at least 3 h. 22.6 A manufacturer agrees to supply precision machine screws with length specification as: 1 = (10.05 + 0.12) mm His first lot of screws consisted of: 1, = (10.00 + 0.1) mm, and the second lot had: t= (10.1 £ 0.1) mm. what is the probability that a screw selected randomly in each lot would be defective? 22.7 A random sample was taken of the internal mean diameters of nuts produced by a manufacturer. The mean was 7.5 mm with a standard deviation of 0.1 mm. If nuts having a diameter of 7.375 and 7.583 are acceptable, what percentage are rejected? 22.8 A producer has to control the process very carefully when making the packets of dry peas or any other commodity sold by weight. He does not want to risk having under-weight packets, leading to less customers. However, he does not want to give too much away in over-weight packets. ‘A machine used for filling 0.5 kg packets of dry peas, produces packets with weights which are normally distributed having a mean of 0.52 kg and a standard deviation of 0.01 kg. Find: (a) what proportion of packets have less than 0.5 kg peas, and (b) more than 0.53 kg. 22.9 A wire type strain gauge produced by a certain firm has a mean resistance of 75 © with a standard deviation of 0.3 Q. They are used in a certain application where the requirements are 75 + 0.42 Q. (a) What proportion of gauges will be defective? (b) What should the precision be if the manufacturer wants to have 90% within the required range? Assume the resistance of gauges manufactured to be normally distributed 22.10 A firm manufactures machines that are used to test the gripping power of the men from the age group 18-28. It is found that the grip strength is normally distributed with a mean value of 120 kg with a standard deviation of 20 kg. A prize equal to the fee is guaranteed if the needle on the dial shows 130 kg and double the fee if the reading is 150 kg. If a sample of 500 men test their gripping power, what would be the earning of a person owning such a machine if the fee charged per man is Re. 1 22.11 Specifications for concrete used in civil engineering jobs may require specimens to be made and tested on site. These specimens are of stated size and shape (usually cubes). The normal model is judged suitable for analysing the results of strength tests on the cubes. Further, for a set of conditions, the coefficient of variation of cube strengths is used to describe the variability. If a research laboratory report gave a figure of 0.8 for the ratio of minimum strength to mean strength taking the definition of minimum strength as the strength below which a certain percentage of specimens may be expected to fail, determine the coefficient of variation if the percentage given is: (a) 5% and (b) 1%.‘Normal Distribution 589 22.12 22.13 22.14 22.15 22.16 22.17 22.18 22.19 A cosmetic firm uses a machine to pack cream in bottles. The statistical analysis carried out gives the mean weight of the packed bottles and empty bottles as 200 g and 50 g with standard deviation of 10 g and 8 g, respectively. Can a sample of 10 bottles have a mean cream content of 140 g? State with 95% confidence level. A firm manufacturers resistors for a certain application that requires the resistors to be of 100 Q with a maximum allowable deviation of 1.2 Q. The manufacturer takes several samples of 50 resistors and finds the mean of means to be 101 © with a standard deviation of means as 0.5 ©. (a) With how much confidence can we say that the product should be acceptable? (b) To have 95% confidence, what should be the standard deviation of a sample of 50 randomly selected products? Five automobile tyres, each of two different brands, were tested for wear (in grams) after driving for 2000 km under the same conditions. The data obtained is as follows: Brand A 135 1B 124 126 Bg Brand B 12.5 12.7 120 13.7 126 Can we conclude that brand A is significantly superior to brand B? 50 bulbs of brand A were tested and it was found that 68.3% had a life of 1160 ~ 1240 h with a normal distribution. The manufacturer of brand B claimed that his product had a longer life. Subsequently, 100 bulbs of brand B were tested and found to follow normal distribution with 50% having a life of 1190-1230 h. Are the bulbs of brand B significantly better than brand A? Of two similar groups of patients A and B consisting of 50 and 101 individuals, respectively, the first was given a new type of sleeping pill and the second a conventional type. For patients in group A, the mean number of hours of sleep was 7.88 h with a standard deviation of 0.24 h. For patients in group B, the mean number of hours of sleep was 7.75 h with a standard deviation of 0,32 h, Test whether the new type of sleeping pill is significantly better than the conventional pill Use 95% confidence level. 100 bearings of brand X were tested and found to have a mean operating life of 30,000 h with a standard deviation (adjusted) of 2000 h; while 64 bearings of brand B showed a mean operating life of 32,000 h with a standard deviation (adjusted) of 4000 h. Is brand Y better? You may assume that both brands follows a Gaussian distribution. A speed-breaker in the form of a mild obstruction on the road was put up in front of a roadside school gate. The means and standard deviations of the vehicle speeds, respectively before and after the putting up of the speed breaker were: X,=40.5 km/h 0, = 582 km/h — 1m, = 100 318kmh 0;=8.51 km/h — n, = 80 Indicate whether the installation of the speed-breaker has influenced the psychology of the automobile drivers or not. ‘Two engineering classes independently measured the static coefficient of friction of wood on glass by a technique known to be free from systematic errors. Their results were as follows: Class 1 Class I Fa a2 0.423) o 0.0254 0.0241 n 50. 64590 Instrumentation, Measurement and Analysis On the basis of the above data alone, can it be stated that there is a significant difference in the results obtained by the two classes? Use a confidence level of 95%. 22.20 Average shear strength of 100 steel specimens grouped in the form of a frequency table are: Shear strength of specimen (in Nim?) Observed frequency f, 1500.5 1 1550.5 1600.5 3 1650.5 16 1700.5 8 1750.5 4 1800.5 21 1850.5 4 1900.5 2 1950.5 2000.5 1 22.21 Test whether the data obeys the normal distribution. If so, compute the chi-square probability. The volume of an aluminium block was determined by immersing it 100 times in water and noting the amount of volume displaced. The measurements yielded the following distribution: Volume (in em*) 120.2 1204 120.6 120.8 121.0 121.2 121.4 121.6 121.8 Frequency 3 6 Bb 8 wo 8B B 6 3 Test if the data supports a Guassian distribution of errors. Use 10% significance level. 22.22 A die was rolled 60 times and the distribution of numbers on the uppermost die face was found to be as follows: Number on the uppermost die face 1 2 3 4 5 6 Number of occurrences 16 7 u 6 4 6 Test the hypothesis that all faces of the die have equal probability of landing uppermost, ie. the die is not loaded. Use 10% significance level using x7 test. 22.23 While reading a scale where the last figure is estimated, some observers show a marked preference for particular digits. The following table shows the distribution of the last figure in 200 randomly chosen routine readings made by one observer Last figure 0 1 2 3 4 3 6 7 8 9 Observed frequency 40 15 171619308 Test whether there is evidence of such a preference, if the chance/risk of wrongly accusing the observer regarding his bias is 5%, 22.24 A study was conducted on a set of patients who did not sleep well. Some were given a newly developed sleeping pill, while the others were given identical looking sugar pill and all patients thought that they were being given the sleeping pills. They were later asked whether the pill helped them in sleeping well or not. The results of their responses is given in the 2 x 2 contingency table. Assuming that all the patients told the truth, test the null hypothesis that there is no‘Normal Distribution 22.25 22.26 22.27 22.28 591 difference between the newly developed sleeping pill and the sugar pill at a significance level of 0.05. Further, also examine whether Yate’s correction makes any difference in the acceptance or rejection of the null hypothesis. Patients who slept well Patients who did not sleep well Patients who took sleeping pills Patients who took sugar pills 33 19 17 31 A study was conducted to determine the effectiveness of a vaccine in combating an infectious disease. In an area where the disease was prevalent, a total of 200 people were tested, 80 of whom were vaccinated in the last 12 months and 120 were not. In each case, it was noted whether they contracted the disease and, if so, whether it was in severe or mild form. Test if the vaccine is effective, Use 5% significance level. ‘No disease ‘Mild ‘Severe ‘Vaccinated 45 25 10 Not vaccinated 30 50 40 ‘Two treatments were tried out to control a certain type of plant infestation with the following results, Treatment Type Number of plants examined Number of plants found infected A 150 15 B 150 6 Can we conclude that treatment B is superior to treatment A in controlling this type of infestation? Use 5% significance level. The following table shows the number of defective and satisfactory articles in two samples, one taken before and the other taken after the introduction of a modification in the process of manufacture. Defective articles Satisfactory articles Total Before 20 120 140 After 5 15 80 Total 25 195 220 (a) Determine whether the data provides a strong evidence that the modification affects the quality of articles produced. Use 10% significance level (b) Repeat this problem using Yate’s correction, ‘The observed frequencies in a2 x 2 contingency table are given helow: 1 2 Total A 4, Ny B by Np Total ™ N Assuming null hypothesis, determine the expected frequencies in the various elements of the contingency table. Further, show that the x”-parameter without Yate’s correction is given by:592 Instrumentation, Measurement and Analysis N (aby ~ 2b, (N\N2NANa)- 22.29 For the following 3 x 2 contingency table, show that: - l-n Ny 7 2 3 Total A 4 a a Wa B by bs bs Ny Total ™ Nz Ns W 22.30 A study was conducted on a particular dimension of the articles produced by a set of four machines by the use of “go” and ‘no-go’ gauges. The articles were categorised as oversize, within tolerances and undersize. The table below shows the number of articles in each of these three categories produced by machines. A, B, C and D during a certain period of time. Oversize Within tolerances Undersize Total Machine A 6 120 4 130 Machine B 7 102 n 120 Machine C 4 168 8 180 Machine D 5 Ist 4 170 Total 2 341 37 600 Using 10% level, check if there is a significant difference in the performance of the given set of machines. Answers 22.1 (i) 0.90 ii) asymptotic (ili) data (iv) error (v) bell-shaped curve (vi) unity (vii) central limit (viii) (ix) 94.935 and 85.065 (x) difference (xi) is greater than 90% (xii) mean value; standard deviation and total number of data (xiii) 2.667 (xiv) (m— 1) (= 1) (xv) greater than 2.58 times the combined internal standard error of the range of means 22.2 (i) true (ii) false (iii) true (iv) false (v) true (vi) false (vii) false (viii) false (ix) false (x) false 22.3, number of rejects = 5 22.4 o=25.78h 22.5 number of samples surviving at least 3 h = 185 22.6 0.2860 for both 22.7 30.89% 22.8 (a) 2.28% (b) 15.87% 22.9 (a) 16.16% (b) standard deviation in resistance = 0.255 Q 22.10 312 rupees (round figure)‘Normal Distribution 593 22.11 22.12 22.13 22.14 22.15 22.16 22.17 22.18 22.19 22.20 22.21 22.22 22.23 22.24 22.25 22.26 22.27 22.30 (a) 0.122 (b) 0.085 No, the mean cream content of 140 g in 10 bottles is less than the required cream content of 142.98 g at 95% confidence level. (a) confidence level = 65.54% (b) = 0.854 2 No, tyres of brand A and brand B are from the same population, No, there is no significant difference between bulbs of brand A and brand B. ‘New type of sleeping pill is significantly better than the conventional pill Yes, bearing of brand Y is significantly superior to brand X. ‘At 95% confidence level, the result is significant and therefore, the installation of speed-breaker has brought down the speeds of vehicles. Difference of means = 0.011 and 1.96 times in internal estimate of uncertainty of the difference of means = 0,00927. Since the former is greater than the later, hence the result is significant, The data follows Gaussian distribution and P(x) = 0.35. The data follows Gaussian distribution at 10% significance level. For 5 d.f. at 10% level, the critical value of z? from 7? ~ F table is 9.24. The value of z? obtained from the data using null hypothesis is 9.4 and is significant being greater than 72,,. Hence, the die is loaded For 9. def. at 10% level, the critical value of 7? from 7? ~ F table is 16.919. The value of x? obtained from the data using the null hypothesis is 32.1. Hence the result is significant and we can say that observer has a preference of 0 as the last figure. For 1 df. system at 5% significance level 7? = 3.841 and z? observed = 7.85. Since 2y 5 is less than tabular 7? observed, the result is significant and hence the sleeping pill is effective. How- ever, 7? observed = 6.84 with Yate’s correction. Now, with Yate’s correction, the result becomes insignificant and therefore, the sleeping pill is effective. H coserved = 22.2 and 75.95 for a2 d.f. system = 5.99. Since 775.95 < Z7 observed hence the result is significant and the vaccine is effective. > For I df. system at 5% significance level *iabilar = 3.841 and 7 pseryea ~ 4-174. Since (17)o 05 is less than 7° 4.eryeds hence the result is significant and the treatment A is superior to that of B. However, 27 hservea With Yate’s correction becomes 3.28. Now 7795 is greater than 77 herve: Therefore, the result with Yate’s correction becomes insignificant. Hence, there is no difference between treatments A and B. (a) For 1 d.f. at 10% level, the critical value of x? from x? ~ F tables is 2.71. The value of x? obtained from the data using N.H. is 3.264 which is greater than 72,,. Hence the result is significant and we can say that the introduction of modification in the process of manufacture improves the quality of the articles produced. (b) The modified values of 7? using the Yate’s correction obtained from the data with N.H. is 1.60. This is less than 7?.,, = 2.71. Hence the result is insignificant and we can say that the introduction of modification in the process of manufacture does not affect the quality of the articles produced. [22 aaebasing v1. = 9-64. (2 10% tevtl 10.645 (from x? ~ F table) Hence there is no significant difference in the performance of given set of machines.Chapter af 23 Pe Graphical Representation and Curve Fitting of Data — INTRODUCTION Very often, we observe that a relationship is found to exist between two or more variables. For example, the weight of a person depends on his age, height, etc. Similarly, the wear of a car tyre depends to a large ‘extent on the mileage it has run. itis frequently desi able to express the relationship governing the various variables in a physical phenomenon in a mathematical form. Now, in order to determine the governing relations of the variables, the first step becomes the collection of the data showing the corresponding valves of variables under consideration. For example, if x and y denote respectively the load and extension in a typical ten: sion test of a specimen on universal testing a machine, then a simple data of n observations would consist of extensions ya, Yar Yy «- +» Yq Cofresponding to the loads x, x,, Xy- - %» The next step would be to plot the various data points (X,Y), War Ya «1 (or Yo) OF 2 rectangular coordinate system so as to obtain the scatter diagram of the tabulated data. The drawing of the scatter diagram helps the experimenter to visualise the smooth curve approximating the data. It also helps him to note especially the peculiarities such as maxima, ‘minima, inflections, hysteresis, etc. which become more apparent than when the information is in tabular form After drawing the scatter diagram we try to pass mostly by free hand the best possible smooth curve through the various data points. Such a curve is much more likely to represent the truth than the scatter of the individval points and is called the approximating curve of the given data. For example, in Fig. 23.2(a) the data appears to be approximated well by a straight line and we say that a linear relationship exists between the variables. Further, in Fig. 23.2(b), the best approximat: ing curve is not a straight line and so we can say that a non-linear relationship exists between the variables. It may be noted that itis often wasteful to empoly the ranges of the coordinate axes such that zero origin is included in the graphical representation of the data, However, it should be shown only if the data point happens to be there,data occur on both sides of itor if there isa theoretical reason for expecting the curve to pass through it.Graphical Representation and Curve Fitting of Data 595 an Io) Approximating ‘curve through the data points My Yo) ton ge 90 4 ppproximating 7 vO cue tren |? 74/ the data points Law.) @ ©) Fig. 23.1 Typical scatter diagrams and the approximating curves through the data points 23.1 HB EQUATIONS OF APPROXIMATING CURVES For the purpose of reference, the governing equations/functional relationships of the various common types of approximating curves are listed in Table 23.1. In the various equations, as a matter of standard practice, abscissa gives the values of x which represents the ‘cause’, i.e., the independent variable and the ordinate gives the values of varible y which represents the corresponding ‘effect’, i.e, the dependent variable, However, in certain exceptional cases, the roles of x and y can be interchanged. Table 23.1 Equations of Common Type of Approximating Curves («and y are the independent and dependent parameters, respectively. All other letters in the equations represent constants.) si. No. Type of Curve Governing equation/functional relationships 1 Stratight line or linear relationship Parabolic or quadratic curve or polynomial of second degree 3 Cubic curve or polynomial of third degree Quartie curve or polynomial of fourh degree 5 Polynomial of nth degree 6 Hyperbolic curve 7 Modified hyperbolic curve 8 Exponential curve 9 Power law or Geometric curve 10 Modified exponential curve 11 Modified power law curve 12 Gompertz curve 13 Modified Gompertz curve 14 Logistic curve 15 Logarithmie funetion curve 16 Trigonometric curve ye ag + ae y= ag + aye + ay? ye ag tay tay? +a jot ayx + age? + ay + ag P= ag + ayx + ay? +... a,x" L or = ay tax atax y = ab™ of log y = log a+ x (log 6) = ap + ayx ie" or ny = In a+ Bx = ay + ax nw? or log y= log a + b logx ab" +e .q* or log y = log p + (log g)b* vepghed tort 1 y= Wax+ b) 1.0 Moditied hyperbolic curve Logarithmic curve Hyperbolic curve Fig. 23.2. Graphical representation of the curves on linear versus linear graph paper corresponding to the various functional relationshipsGraphical Representation and Curve Fitting of Data 599 23.3.1 Graphical Method Generally, it is possible to draw a fairly good straight line through a given set of data points using a transparent ruler, in such a way that the experimental data points lie uniformly about the line. Although this method is quite convenient, yet for accurate work it has a number of shortcomings, viz. 1. Different observers draw the line differently. 2. An observer would tend to draw the line appreciably differently if the data is plotted to a different scale, say if the scale of the ordinate is elongated. 3. The graphical procedure does not provide any idea of how good the straight line is, ie., no estimate of uncertainty in the slope is available. 23.3.2 Method of Sequential Differences Suppose there are n pairs of observations, i.e., (x), )), (X,Y) - « (qe Jy) Whose scatter diagram is of linear type [Fig. 23.1(a)]. To determine the slope of the line that best represents these m points, we determine the slopes of each pair of adjacent points which gives (n - 1) values of the slope. The best estimate of the slope is then given by the average of these values. Further, it is possible to calculate the best estimate of uncertainty in the mean value from the deviations in the individual slopes. Now, the various slopes for the various pairs of points are Os-y) q- 2 (om) Os= 92) a 23.2) a" Ga (23.2) On= Int) 1 Gos) a) oe The mean of all slopes 7= aj (n—1) (233) a After determining the mean slope @, we now determine the intercept of the linear relationship. This is determined by assuming that the centroid of the data lies on the straight line, The centroid of the given data points is given by (23.4) and Substituting the values of mean slope, X and Y in the straight line relation of Eq. (23.1), we get the intercept das or % = 7- However, this method suffers from certain limitations that are as follows: 1. The straight line fit determined by this method is generally not very accurate if the range of a, is large. 2. If the x;’s are at equal intervals, say Ax, then, (23.5)600 Instrumentation, Measurement and Analysis a — On- (Ax) (n=1) i.e., Zis a function of the first and the last points of the data and no weightage is given at all to the intermediate points in the determination of mean slope. Thus, this is serious flaw in this method as unfortunately the first and the last points of a straight line plot are usually most suspect. 23.3.3 Method of Extended Differences (23.6) In this method, the given data points are divided into two equal groups, namely high values and low values of x and the corresponding points in the two groups are differenced. Let us say that we have an even number of data points (in case of odd number of data points, the most suspect point or alternatively any one point is discarded in the calculations), then, two groups of data are low values (group 1) = Cy, 91), (2 Ya). Cm Ind (23.7) and high values (group 2) = (avis Yar)» Cnszy Yarads + + +» Cam Yan) (23.8) ‘Now, the values of slopes between the corresponding high and low values of groups 1 and 2 from Eqs. (23.7) and (23.8) are Yau Yi a= a= (23.9) Sma ‘Thus, the mean slope is given by 24 a —~ (23.10) in Knowing the mean slope, the value of the mean intercept is calculated using the procedure explained in Eqs. (23.4) and (23.5). Further, the magnitude of the uncertainty or the internal standard error may be calculated from the deviations of the individual slopes from the mean value, It may be noted that the procedure of extended differences gives the mean slope which is the same as the slope of the centroids of the high (group 2) and low values (group 1). To sum up, the method of extended differences is considered better than the method of sequential differences because it takes into account all the data points even if the intervals between different data points are the same. Further, it has the advantages of ease and simplicity and gives good results whether the errors arise in x or y coordinates. Thus, it is usually recommended for ordinary calculations. How- ever, in situations where we know which coordinate is giving rise to experimental errors, the well-known method of least squares is usually preferred over this method. 23.3.4 Method of Least Squares This is general method for determining the best fitting curve, say a best fitting line or best fitting parabola or best fitting any other approximating curve for a given set of data. The main advantage in this procedure is that it does not depend at all on the judgement of an individual in determining the best fitting curveGraphical Representation and Curve Fitting of Data 601 Let us first consider a general case of a set of data points in which the scatter diagram of the experimental points represents the curve C (Fig. 23.3). In this method, we assume that the scatter in the points is due to errors in measuring y and the best fitting curve is that which minimises the sum of the squares of the errors in the y-direction. In other words, of all the approximating curves for a given set of data points, the curve having the property that the sum of error squares S, given by (ey; +e52 +...+e3,) is minimum, is called the best fitting curve. Further, a curve having this property is said to fit the data in the least square sense and is called the least square curve. Thus, a line satisfying this property is a least square line, a parabola the least square parabola, ete. y Fig. 23.3 A typical scatter diagram of data showing errors in measuring y with respect to curve C It may be noted that if values of x, the independent vairable are considered more accurate than y, then the sum of error squares in the y-direction are minimised with respect to the approximating curve and the resulting curve is called the least square regression curve of y on x. Conversely, if the values of y are ‘more accurate than x, then the sum of squares in the x-direction are minimised and the resulting curve would be the least square regression curve of x on y. 23.3.5 Linear Least Square Curve Fitting Figure 23.4 shows the scatter diagram of the data in which a straight line curve, often called linear regression equation, could possibly be fitted. The general form of the linear equation is yray tax (23.11) where ay and a, represent the intercept and the slope of the line, respectively. If the values of x are more accurate than y (i.e., regression of y on x) then we estimate the values of ordinate y* from the more accurately observed values of x; that is, w nt ax, (23.12) where y; is the estimated value of the dependent variable y,. Thus, the error e,, is given by = W- (23.13) The least square principle states that for the best fitting straight line, the sum of the squares of the errors should be minimum. That is, S.= Lei (23.14) a602 Instrumentation, Measurement and Analysis ! 10 Estimated point one | © Observed point ® Centroid (x, ¥) x Fig. 23.4 Ralationship of terms in a linear regression equation is minimised. Substituting the value of e,, from Eq. (23.13) we get the sum of error squares as: Loi wP 23.15) Substituting the value of yj from Eq (23.13) in Eq. (23.15) we get, S. S. 20; = ay — ax) (23.16) In order to minimise S,, we determine the partial derivatives of S, with respect to ay and a, in Eq. (23.16). Each derivative must be zero at its minimum. Thus, this process gives two equations with ‘two unknowns. Therefore, %e (23.17) Bay ¢ as, and a 23.18) 3a, Equations (23.17) and (23.18) can be simplified to E vi ayn—a, ¥ xj= 0 23.19) iat isi and Exyi- a F (23.20) ini Solving Eqs. (23.19) and 23.20 simultaneously we get, En dan, (23.21)Graphical Representation and Curve Fitting of Data 603 The denominator in Eqs. (23.21) and (23.22) is the same and could be replaced by a quantity say A and and a= (23.22) further short notation ¥x;, Dxyp ete. in place of ¥ x, ¥ x,y}, ete. could be employed to present the above equations as: mi (Zy La? - D5 Day) ae ~ (23.23) and a, eee (23.24) where ar {oda -(Ea)} It may be noted that by dividing Eq, (23.19) by n we get zm = ay +a, aa (23.25) or ¥ =a)+a,X (23.26) This shows that the least square line would pass through the centroid (¥, 7). In practice, in numerical problem solutions, we determine the value of slope a, by employing Eq. (23.24) in which we have to determine the values of x, and Yy,, Then, it becomes easier to determine the value of the intercept ay by employing Eq. (23.25) in place of Eq. (23.23), In case the values of y happen to be more accurate than x, then we regress x on y (i.e. x = By + byy) and the expressions for the slope b, intercept by of the line become (Lay -Lu dy) (23.27) where a= {odor -(EnJ} and (23.28) 23.3.6 Determination of Uncertainties (Internal Standard Errors) in the Slope and Intercept Values for Linear Regression In the least square regression of y on x, we have assumed that the values of x are accurate and have minimised the deviations in y using the least squares criterion. Therefore, initially we determine the standard deviations and then the uncertainty (internal standard error) in y values. Subsequently, the expressions for uncertainties in ay and a, which are functions of uncertainties in y are determined.604 Instrumentation, Measurement and Analysis © Oroeasred ~ Yesimates Fi - Lo = a.5 ny Ligy, 2 (Sy 01- ayaa) Xj — 4); — aq) — aq (Yj — 4X; — )} (23.29) In may be noted that the second and third terms of Eq. (23.29) are the same as Eqs. (23.19) and (23.20), e., the normal equations of the linear least square regression. Since both these equations are equal to zero, therefore, Eq. (23.29) becomes on0)= (Xo? — 4 Zi ~ a Zi) (23.30) From Eq, (23.30), the internal estimate of uncertainty U,(») can be calculated as oy) U, —_— (23.31. 0) = 23.31) It may be noted that in drawing a straight line, two points are required and therefore one extra degree of freedom is lost. Thus, the denominator in the expression (23.31) is (n — 2)" in place of (n — 1)", Now, we determine the internal estimate of the uncertainties in the slope and the intercept of the line. From Eq. (23.24) we get, inYxyi- Le Dvd a,= a Ainley tata tH) MRO tae HID = KML RF Oa Rat Oy RW (23.32) Employing the formula of propagation of error for a linear combination we get, Olay) = Fe Ml FPF OY Oy HP 7 (23.33) Substituting 6()) = 002) =. . . = O(y,) = (7) in Eq, (23.33) we get, 24) = zy -o% ay = Re (EG XY ow} = 2 fog -28, z DP = 2 Xx, +X It oy) 5 efor -2XEx +0%"} 0)Graphical Representation and Curve Fitting of Data 605 (ay) = + yy (23.34) va Similarly, it can be shown that vn Una) = Vr U,() (23.35) Thus, Eq. (23.35) gives the expression for determining the internal estimate of uncertainty in the slope of the least square line in terms of uncertainty in the values of y. Now, the expression for the intercept of the least square line is given by Eq. (23.23) as a= 4 (2 ZH - LH Dans) FIZ Orta t tad A Reo tat ta) - H{z# aks) t4(E Again, using the linear combination formula for the propagation of errors we get, nkx, ye} (23.36) oa) = EEE ~ aks] oto) (23.37) Substituting o(y,) = o(y) =. . . = O(y,) = o(y) in Eq. (23.37) we get, 1 Cd ye Saf =nkx} - o%) z 2a, La +e a7 39) Lfn(Sa7)- 20 Ey Da? +P? DH} oo) - a {Ex8 -28 D5, +nF*} 0%) nDx2 fps606 Instrumentation, Measurement and Analysis -(E5)'}-4] = PEE oy) (23.38) aye olay) = ee) 0) 3.39) Now, it can be shown that (asf) Usa) = a Ua) (23.40) Thus, Eq. (23.40) gives the expression for determining the internal estimate of uncertainty in the intercept of the least square line in terms of uncertainty in the values of y. Problem 23.1 Successive masses of 1 kg each (of high accuracy) were added at the hook at the lower end of a verti cally hanging wire. The position of a mark at the lower end was measured using an ordinary scale. The {following results were obtained: Load x (kg) 2 203 4 5 6 7 8 9 © Position of marky (em) 6.05 6.20 6.25 635 640 650 655 6.60 670 6.75 (a) Determine the equation of the best fitting straight line using ( Graphical method (ii) Method of sequential difference (ii). Method of extended difference (iv) Method of least squares (6) Also, determine the internal estimate of uncertainty in the values of slope in each of the above-mentioned procedure for fitting the straight line relationship. Solution (i) The scatter diagram shown in Fig. 23.6 is obtained by plotting the points (1, 6.05), (2, 6.20), + (10, 6.75). A straight line which approximates the data (using personal judgement) is shown by the dashed line in the Fig. 23.5. This is, in fact, one of many possible lines which could have been constructed. Choosing any two points (x,,.”;) and (x, 73) on the approximating straight line, the straight line relation can be determined from the equation ae ~ (=) 2-H Two such points chosen in Fig. 23.5 are A (2, 6.15) and B (7, 6.55). Therefore, the required equation becomes yo eas ~ S55=815 «yy a 2 & or 0.08x + 5.59

Normal Distribution:: - Probability - Characteristics and Application of Normal Probability Curve - Sampling Error
No ratings yet
Normal Distribution:: - Probability - Characteristics and Application of Normal Probability Curve - Sampling Error
21 pages
Chapter 4 The Normal Distribution
100% (1)
Chapter 4 The Normal Distribution
12 pages
Lesson 2-08 Properties of Normal Distributions
100% (3)
Lesson 2-08 Properties of Normal Distributions
18 pages
Clinical Biochemistry of Domestic Animals 6th Edition All Chapters Included
100% (16)
Clinical Biochemistry of Domestic Animals 6th Edition All Chapters Included
17 pages
Normal Probability Curve: By: Keerthi Samuel.K, Lecturer Vijay Marie College of Nursing
No ratings yet
Normal Probability Curve: By: Keerthi Samuel.K, Lecturer Vijay Marie College of Nursing
22 pages
Week 4 & 5 Stat
No ratings yet
Week 4 & 5 Stat
26 pages
Lab Report Gassiuan Distribution
100% (1)
Lab Report Gassiuan Distribution
13 pages
Statistics and Probability: Quarter 3 - Module 3: The Normal Distribution
No ratings yet
Statistics and Probability: Quarter 3 - Module 3: The Normal Distribution
29 pages
Quamet1 - CM6
No ratings yet
Quamet1 - CM6
10 pages
Module 2
No ratings yet
Module 2
13 pages
4.normal Distribution Haomin2021
No ratings yet
4.normal Distribution Haomin2021
94 pages
Inferential Statistics Lecture 2
No ratings yet
Inferential Statistics Lecture 2
54 pages
Gec004 - Module 4 - Normal Distribution and Regression
No ratings yet
Gec004 - Module 4 - Normal Distribution and Regression
84 pages
3.normal Distribution
No ratings yet
3.normal Distribution
42 pages
EPS - Chapter - 5 - Continuous Distributions - JNN - OK
No ratings yet
EPS - Chapter - 5 - Continuous Distributions - JNN - OK
30 pages
Eutrophic at I On
No ratings yet
Eutrophic at I On
21 pages
STATFINALedit
No ratings yet
STATFINALedit
69 pages
AE 09 Lecture (Chapter 6) Continuous Probability Distribution
No ratings yet
AE 09 Lecture (Chapter 6) Continuous Probability Distribution
32 pages
Mod-3 2024
No ratings yet
Mod-3 2024
48 pages
Clinical Biochemistry of Domestic Animals 6th Edition ISBN 012370491X, 9780123704917 Secure Download
No ratings yet
Clinical Biochemistry of Domestic Animals 6th Edition ISBN 012370491X, 9780123704917 Secure Download
16 pages
Chapter 2 Normal Distributio
No ratings yet
Chapter 2 Normal Distributio
43 pages
BIOEPI
No ratings yet
BIOEPI
2 pages
Statistics and Probability 4th Quarter Part 1
No ratings yet
Statistics and Probability 4th Quarter Part 1
29 pages
Continuous Probability Distribution
No ratings yet
Continuous Probability Distribution
16 pages
Lesson 3 Normal Distribution
No ratings yet
Lesson 3 Normal Distribution
49 pages
Normal Distribution
No ratings yet
Normal Distribution
29 pages
Normal Distribution
No ratings yet
Normal Distribution
24 pages
Normal Distribution Review
No ratings yet
Normal Distribution Review
22 pages
Normal Distributions
No ratings yet
Normal Distributions
11 pages
Las Stat Module 4
No ratings yet
Las Stat Module 4
10 pages
Normalcurvegrade11 Final
No ratings yet
Normalcurvegrade11 Final
76 pages
Normal Probability Distribution
No ratings yet
Normal Probability Distribution
6 pages
Normal Distribution
No ratings yet
Normal Distribution
5 pages
M2Q3 - Statistics & Probability
No ratings yet
M2Q3 - Statistics & Probability
4 pages
11 DHCP
No ratings yet
11 DHCP
8 pages
Unit II
No ratings yet
Unit II
4 pages
Normal Distribution
No ratings yet
Normal Distribution
25 pages
Normal Distribution
No ratings yet
Normal Distribution
5 pages
Chapter Ii. Lesson 1 The Normal Dist. Table
No ratings yet
Chapter Ii. Lesson 1 The Normal Dist. Table
3 pages
Normal Distribution
No ratings yet
Normal Distribution
48 pages
EM4 Lecture Notes
No ratings yet
EM4 Lecture Notes
6 pages
B39AX Topic2-P PDF
No ratings yet
B39AX Topic2-P PDF
16 pages
Stat 3
No ratings yet
Stat 3
12 pages
EDA01 Normal Distribution
No ratings yet
EDA01 Normal Distribution
14 pages
Module 6 Normal Distribution
No ratings yet
Module 6 Normal Distribution
6 pages
Distribución Gaussiana
No ratings yet
Distribución Gaussiana
26 pages
CHAPTERS
No ratings yet
CHAPTERS
17 pages
Machine Learning and Pattern Recognition Week 2 Univariate Gaussian
No ratings yet
Machine Learning and Pattern Recognition Week 2 Univariate Gaussian
3 pages
4.05 Functional Form of The Normal Probability Distribution
No ratings yet
4.05 Functional Form of The Normal Probability Distribution
2 pages
Measures of Central Tendency and Position (Ungrouped Data) : Lesson 3
No ratings yet
Measures of Central Tendency and Position (Ungrouped Data) : Lesson 3
19 pages
Normal Distribution Normal Probability Distribution: Mean Continuous Random Variable (X)
No ratings yet
Normal Distribution Normal Probability Distribution: Mean Continuous Random Variable (X)
5 pages
Topic05.Normal Distr
No ratings yet
Topic05.Normal Distr
27 pages
Definition:: Parameters
No ratings yet
Definition:: Parameters
3 pages
Diet of Random Variables
No ratings yet
Diet of Random Variables
8 pages
Normal Random Variable
No ratings yet
Normal Random Variable
8 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
Tugas Pengukuran
No ratings yet
Tugas Pengukuran
6 pages

Unit 4b - Normal Distribution

Uploaded by

Unit 4b - Normal Distribution

Uploaded by

You might also like