0% found this document useful (0 votes)
3 views10 pages

6.1 Basic Statistic

The document provides an overview of basic statistical concepts relevant to the SQC Departments, including sampling methods, frequency distribution, measures of central tendency, and measures of dispersion. It explains how to interpret data through various statistical measures such as mean, mode, median, standard deviation, and coefficient of variation. Additionally, it discusses sampling distribution, standard error, and confidence limits, emphasizing the importance of understanding these concepts for accurate data analysis and interpretation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views10 pages

6.1 Basic Statistic

The document provides an overview of basic statistical concepts relevant to the SQC Departments, including sampling methods, frequency distribution, measures of central tendency, and measures of dispersion. It explains how to interpret data through various statistical measures such as mean, mode, median, standard deviation, and coefficient of variation. Additionally, it discusses sampling distribution, standard error, and confidence limits, emphasizing the importance of understanding these concepts for accurate data analysis and interpretation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Basic Statistic

While the study of statistics is a field in itself, there are certain simple and fundamental
statistical terms that are commonly used in the SQC Departments. An understanding of these
terms and their proper interpretation will aid in understanding the fuller implications of the
test results. The person who wishes to pursue this study further should read some standard
textbooks on statistics.

1.0 SAMPLING
Sampling is used to obtain information about a large group, called the population, from a
smaller representative group, called the sample. The method of drawing samples from the
population or 'sampling' should be such that the sample becomes a true representative of the
whole population.

1.1 Random Sampling


Random Sampling is the most common and most powerful method of sampling. Here every
member of the population has got an equal chance of being selected in the sample. Such
sampling can be done by drawing samples as per random numbers from random number
tables or by lottery method.

1.2 Systematic Sampling


This is a method of sampling following a specific fixed pattern or time interval. For example,
one has taken sample from production line every 15 minutes.

2.0 FREQUENCY DISTRIBUTION


The frequency distribution can be well understood through an example as described below.
The breaking strength in lbf of 6 lb/spy x 2 Ply yarn is tested and the results 50 yarns tested
are shown in Table-1.

Table – 1: Yarn Breaking Strength Values (The results are arranged in ascending order)
11.5 15.3 16.3 16.8 18.3
12.4 15.4 16.4 16.8 18.4
13.2 15.5 16.4 16.8 18.5
13.4 15.6 16.4 16.9 18.7
13.6 15.7 16.5 17.2 19.1
14.2 15.8 16.5 17.3 19.3
14.3 15.9 16.6 17.4 19.6
14.6 16.2 16.7 17.5 20.1
14.7 16.3 16.7 17.6 20.7
14.8 16.3 16.7 17.9 21.8
Now, frequency of a value of a variable (here yarn strength) is the number of times it occurs in
a given series & observations. The range of the strength values obtained is divided into a
number of groups or classes like 11.0 – 11.9, 12.0 - 12.9 and so on up to 21.0 - 21.9. Such
groups are called Class Intervals. Between 13.0 - 13.9, there are 3 values 13.2, 13.4 and 13.6
and so here the frequency (called Class Frequency) is 3. Thus a table may be constructed as
following.
Group Tally Mark Frequency Group Tally Mark Frequency
11.0 – 11.9 I 1 17.0 – 17.9 IIII I 6
12.0 – 12.9 I 1 18.0 – 18.9 IIII 4
13.0 – 13.9 III 3 19.0 – 19.9 III 3
14.0 – 14.9 IIII 5 20.0 – 20.9 II 2
15.0 – 15.9 IIII II 7 21.0 – 21.9 I 1
16.0 – 16.9 IIII IIII IIII II 17

Such table is called a Frequency Table or more


usually Frequency Distribution of the variable.
Frequency Distribution is a statistical table, which
shows the values of the variable arranged in order of
magnitude, either individually or in-groups and also
the corresponding frequencies side by side.
Looking at the frequency distribution one gets a quick
idea about the central value or average in the data
series and shows how the observations vary
(dispersion) around the average. Histogram (Fig. A.1)
is the most common form of diagrammatic
representation of a grouped frequency distribution. It
consists of set of adjoining rectangles drawn on a
horizontal base line. The width of a rectangle is the class interval and its height represents the
class frequency.

3.0 CENTRAL TENDENCY


There are three common measure of central tendency: Mean, Mode and Median. Of
these three, the one most frequently used is the Mean, also known as Average.

3.1 Mean (or Average)


The mean is often the best single statistic to characterize a set of test results. It is the
sum of the individual observations divided by the number of observations. This may be
expressed as follows:
∑𝑋
Mean = ..... (1)
𝑁
Where X : observations; N = number of observations; ∑= the sum of

3.2 Mode
The Mode is the value that occurs most frequently, and it is possible for a group of data
to have more than one mode. In this respect, the mode differs from the mean and the median
as there are always only one mean and one median.
3.3 Median
The Median is the middle one of a set of values (observations) arranged in descending order.
When the number (N) of values is an even number, the average of the two central items is
taken as the median. If the number of values is odd, then the median is the value of the item
which is item number (N + 1)/2 from the end.
Mostly, the measure of central tendency used in textile testing laboratory will be the
arithmetic mean or average. The mode and median are defined merely to show how they
differ from the mean.

4.0 MEASURES OF DISPERSION


The values of a variable are generally not equal. They may be very close to one another or may
be markedly different from one another. Both the cases may give the same mean value. So to
get an idea of uniformity of results it is very important to know that how the data are
scattered about the mean value. The feature that reflects the scatter of the values is called
dispersion. The common measures of dispersion are range, standard deviation and
coefficient of variation.
If there were no variation, then there would be no need for statistical computations. Variation
is the very essence of statistics. Because of the variation present in all data, it is necessary to
know how to judge this variation and how to determine whether one sample representing one
population is better than another sample representing another population. The measures
given below are among the most fundamental methods of accomplishing this task. Indeed, no
presentation of data is complete without a measure of the variation of the data.

4.1 Range
The range of a set of observations is the difference between the maximum and the minimum
values and thus it represents the maximum possible difference between any two observations
in a set. Thus,
Range = Maximum value - Minimum Value

4.2 Standard Deviation


It is a well known fact that two samples may have the same average and yet one is acceptable
while the other not, depending upon the degree of scatter of individual results around the
average. The standard deviation is an agreed statistic for expressing the degree of scatter.
This value is very useful to the person analysing data since with it one can predict, within
certain limits, where future samples from the same population will fall. One can also
determine whether one group of data differs significantly from another.
Standard deviation (basically, Root Mean Squared, or RMS, deviation) of a set of observations
is the square roe of the arithmetic mean of squares of deviations of the data from the mean. It
is an index of variability and usual denoted by Greek small letter s (called 'sigma').
This standard deviation is calculated using the equation
where X : individual observations X̄ : Mean; N : number of observations ;
∑= the sum of

Example :
X X2 X X2
3 9 7 49
6 36 4 16
8 64 3 9
2 4 5 25
5 25 6 36
4 16 6 36
3 9 7 49
5 25 8 64
6 36 2 4
90 512
N = 18 ∑X = 90 X = 512
2

Hence, from the equation (A.3)


512−(90)2/18
σ = √[ ] = = (3.6470) ½ = 1.91
17
There is a relationship between the standard deviation and the range in an unbiased sample
which is as follows:
σ = Range /F
where F is a factor as given in Table -2. For small samples, time can be saved using this
procedure to determine the standard deviation.

Table -2: Value of Factor, F


No of Observations in a group 2 5 10 50 100

Value of Factor F 1.128 2.326 3.078 4.498 5.015

4.3 Coefficient of Variation


The term coefficient of variation (CV%) is defined as
Standard Deviation
CV% = x100 = (σ/ X̄) x100 ......(4)
Mean
While the standard deviation is expressed in units (like lb, kg, etc.), the coefficient of variation
is expressed in percentage. The standard deviation is an absolute measure of the variation but
the coefficient of variation is a relative measure of variation. This makes it a better means of
comparison of data. Thus, whenever we wish to compare between two different data sets, in
regard to their variability, we consider CV%. For example, to compare the variation in a
stronger yarn (having higher mean strength value) with the variation in a weaker yarn
(having lower mean strength value), we compare the strength CV% values of these two yarns.
The rule of thumb is : lower the CV%, the less variation in the sample and better the sample.

Example: Using the data given as example for the standard deviation, the coefficient of
1.91
variation may be determined as CV% = x 100 = 38.2%
5.0

5.0 SAMPLING DISTRIBUTION


If the parent distribution of the individual values
is 'normal' with a true mean of X̄ and a true standard
deviation of σt the distribution of mean values of samples,
each comprising of n individual specimens, will also be
'normal' with a mean of X̄ (i.e. the mean of a very large
number of sample means is the true mean) and a
standard deviation of σt /√n.
Some distribution of mean strength values for yam
samples of n = 1 (corresponding to the parent
distribution). n = 6, n = 16 and n = 64 are shown in Fig.
A.2. From this one can find that as the sample size
increases, the distribution becomes narrower which
implies that the sample means become more closely centered around the true mean. For this
reason, for having a better estimate of the true mean of the population (whole yarn lot), one
needs to test samples of large number of individual yarn specimens.

6.0 STANDARD ERROR


Even when the samples are drawn randomly from a population, the values for any statistical
measure as obtained from the respective samples are not same as the value of that measure
for the population. This is the error inherent in dealing with any sample and termed as
standard error (SE). Its practical significance can well be understood from the following
statement:
The difference between the sample and true (population) values of a statistical measure is
unlikely to exceed twice the standard error of that measure.
It is worth noting here that there is certain probability for a sample value to deviate from the
true value by more than ± 2.0 times the standard error and that probability is 0.0455, i.e.
about 1 chance in 20. Since this is a low probability, the word 'unlikely' is used in the above
statement.

6.1 SE of Mean
The standard error of the mean is given by
SEmean = σ1 /√n .....(5)
where σ1 is the true standard deviation of the population and n is the number of specimens in
the sample. Since σ1, is not known, the sample value, σs is substituted in place of σ1 as
approximation, but only in case of samples having specimens more than 30.

6.2 SE of Standard Deviation and Coefficient of Variation


When the standard deviation has been calculated directly using individual values of
specimens within a sample (and not indirectly from the mean range or with the help of an
instrument like electronic yarn irregularity tester), we have the standard error of the
standard deviation and that of coefficient of variation as
SE σ = σ1 /√2n …….(6)
SE cv = CVi/√2n ……..(7)
where σ1, and CVi, represent their true values for the population. If these values are not
known, the corresponding sample values are substituted provided the sample comprises of a
few hundred individual specimens

6.3 Comparison of Two Samples


There is often a need to evaluate whether the difference in values obtained in respect to any
statistical measure (mean, standard deviation, etc.) for two independent samples (say, two
yarns spun under two different processing conditions) is significant or not. For the purpose,
we evaluate first the standard error, SE1-2 of the difference between the measures of two
samples by using the following formula,
SE1-2 = √(SE1 2 + SE2 2) …...(8)
where SE1: SE of any given measure for the first sample SE2: SE of the same measure for the
second sample
If the difference between two samples in respect to any measure is found to be more than the
maximum likely error (i.e. twice the SE of the difference i.e. SE1-2), we conclude that the
difference between the samples is statistically significant but otherwise not significant
statistically.

7.0 CONFIDENCE LIMITS


When the average has been calculated from test results, it is only representative of a small
amount of the total material in the bulk (population). Confidence limits express the maximum
amount the sample average is likely to deviate from that of the population. Or, they may be
regarded as the maximum likely errors due to sampling.
The confidence limits of the bulk average of a particular property depend on:
i) the size of the sample, i.e. the number of test measurements,
ii) the variability (i.e. o or CV%) of the property evaluated, and
iii) the confidence limit chosen, usually 95% or 99%.

The information is required generally in two ways:


i) How many test measurements should be carried out in order to estimate the true
(population) average with given confidence limits?
ii) From a known sample size, what are the confidence limits of the estimate of the true
average?
7.1 Confidence Limits of an Average
(a) Number of Tests for given Confidence Limits
When the true CV% of the property to be estimated is known (or, as is usual in practice, a
good estimate based on past experience of the material under examination is available), it
may be used to determine, for the confidence levels specified, the minimum number of tests to
be carried cut. This number is calculated using the following formulas:
n = F1 x V ......(9)
where V : true CV%; E : % confidence limit
F1 : 1.96 for 95% confidence level;
2.58 for 99% confidence level

When the good estimate of true CV% is not available, the sample CV% value may be
substituted in place of true CV%, as approximation, provided the number of specimens tested
is more than 30.
The Table - 3 gives the sample sizes directly for confidence levels of 95% and 99%, and for
different confidence limits as % of the average.
Example
Let us determine the minimum number of tests required to estimate the average strength of a
yarn with confidence limits of ± 5% of the average when the CV% of yam strength is known to
be 10%. From the Table - A.3, we can easily find that
Minimum number of tests for 95% confidence = 16
Minimum number of tests for 99% confidence = 27

(b) Confidence Limits on estimates of True Average


The C % confidence limits are the limits (±) within which a set of results, say for average
breaking load measurements, on the same population (say, yarn lot) will fall in C numbers out
of 100 successive tests. The formula for these limits is given by :
C % confidence limits = Mean ± F2 x σ ... (10)
where σ: standard deviation of the test results
F2 : multiplying factor at a certain confidence level
The multiplying factors (F2) to give confidence limits at 95% and 99% confidence levels are
given in Table-4.

Example
On carrying out breaking load measurements with 100 yarn specimens, a sample average
value of 3.60 kg and standard deviation of 0.64 kg had been obtained. Let us determine the
95% confidence limits. From the Table - 4, with sample size of 100 and for 95% confidence
level, we find that the multiplying factor F2 is ± 0.199. Hence, 95% confidence limits will be ±
0.199 x 0.64 = ± 0.13 which means that the probability is 95% that the true average yarn
breaking load value lies between 3.47 and 3.73 kg (i.e. 3.60 ± 0.13).

7.2. Confidence Limits of Rates of Occurrence


During mechanical processing, one may encounter occurrences of events which happen
infrequently and at random and which can be represented by a Poisson distribution. The
examples of such randomly occurring infrequent events are: end breakages during spinning,
warp breakages during weaving, generation of slubs along the yarn length, mechanical
failures on a machine, etc.
The observations of the above events are normally made over a fixed period of time or
amount of material. In such case, it is often required to know the accuracy with which the
average rate of occurrence has been estimated. Table - 5 gives the confidence limits of the
total number of occurrences, for approximate* conventional levels of 95% and 99%. The
choice of 95% level, for example, will help to determine the interval within which the true
number of occurrences will lie with 95 chances in 100 or 19 chances in 20. The way to derive
confidence limits for rates of occurrence has been shown in the Example - 1, while Example -
2 has shown a typical use of such limits.

Example - 1
65 end-breakages are observed in a spinning machine in 3 hours. Let us determine the 95%
confidence limits of the true end-breakage rate, i.e. end- breakages per hour.
From the Table - 5 (by interpolating between the listed values), the 95% confidence limits of
total recurrences can be found as 50 and 81 end-breakages in hours. Hence, the confidence
limits of the true end-breakage rate are 50/8 and 81/8, or 6 and 10 per hour. Thus, the true
end-breakage rate lies between 6 and 10 per hour with a probability of 95%.

Example - 2
A measurement of number of slubs per Km of yarn length showed a value of 8 where the
target average was 3. Let us examine whether the target was exceeded or not.

From Table-5, the lower and upper 95% confidence limits for 3 occurrences are 1 and 9
respectively. Thus, if the target of 3 is being maintains test values of 1 and 9 would be
expected to occur in approximately* 19 cases out of 20. We may conclude that there is no
evidence from this single test that the target is not being maintained.

**Tabulating values to whole numbers results in the confidence being not exactly 95% and 99%,
particularly for small numbers, i.e. below 10
Table-3
Sample Sizes for 95% Confidence Level

Sample Sizes for 99% Confidence Level


Table-4
Sample Sizes
Multiplying Factor F2 for Confidence Limit

Table-5
Approximate Confidence Limit for Occurrence of an Infrequent Event

You might also like