Problem
Problem
CHAPTER V
For Process Capability Index for two correlated quality characteristics following
Bivariate Exponential distribution.
The percentile method will be used to evaluate process capability for non-normal bivariate
characteristics. To use equation 4.20 one needs to calculate the probability of quality
characteristics falling between specification limits. In order to calculate this probability we
first need to know the distribution of the data.The following steps are involve in calculation:-
1. Select the sample data (X1, Y1) (X2, Y2)...... (Xn,Yn).A real data set is used from Wangs
paper[116] .Wang discussed a manufacturing product (called connector) from a computer
industry having multivariate (seven) quality characteristics. This data set contains a
sample of 100 parts that were tested on seven quality characteristics of interest to the
manufacturer. Based on the quality characteristics and the manufacturing processes, we
found that variables 1 (contact gap), 2 (contact loop), are correlated. We found that
variables (X1, X2,) are correlated. We selected the variables 1 and 2 for the study. The
specification limits for these characteristics are 0.100.04 , 00.50, respectively.
2. Data fitting.
To determine what distribution data follows which is an important step. It can easily put data
into a software package that will test many different distributions to find out which
distribution fits into data best. But it should have a reason for using a certain distribution it
must make sense in terms of best process. It makes sense that it follows an exponential
distribution.
*Please remove the title as Gama distribution because in this step we are not defined or select
distribution. Figure given below is from minitab software tool please check ........at this stage
if we are not transform the data then scattered plot or histogram plot should be same but here
they are different.
Figure 5.1: Scatter plot of variables X1, X2 by minitab.
For X1,X2, CoefVar = ?????? This is from Minitab please verify it.
Once a distribution (with a particular set of parameters) has been fit to the data, a number of
additional important indices and measures can be estimated. It can compute the cumulative
distribution function (commonly denoted as F(t)) for the fitted distribution, along with the
standard errors for this function. Thus, it can determine the percentiles of the cumulative
survival (and failure) distribution for predict the time at which a predetermined percentage of
components can be expected to have failed.
The residual plot confirms Normality assumption of error. Conclusion in case error is non-
normal may be erroneous.
Figure 5.2: Error rate of Simple Exponential and proposed Bivariate Exponential -ML
This figure produces a weighted least squares fit of a straight line to a set of points with error
in both coordinates. It can handle bivariate regression where the errors in both coordinates are
correlated and is capable of performing force-fit regression. This figure produces a weighted
least squares fit of a straight line to a set of points with error in both coordinates. It can
handle bivariate regression where the errors in both coordinates are correlated and are capable
of performing force-fit regression.
Figure 5.3: Proposed Bivariate Exponentially ML and Weibul simple distribution of y.
The Weibull distribution is one of the most widely used lifetime distributions in reliability
engineering. It is a versatile distribution that can take on the characteristics of other types of
distributions, based on the value of the shape parameter, y.
Figure 5.5: Best fitted data of Proposed Bivariate Distribution to Best curve
A good fit of the theoretical distribution to the observed values would be indicated by this
plot if the plotted values fall onto a straight line. The adjustment factors radj and nadj ensure
that the p-value for the inverse probability integral will fall between 0 and 1, but not
including 0 and 1.
3. Parameter estimation and bivariate statistics
Given a distribution or model for data, the next step is to fit the model to the data.
Typical probability distributions will have unknown parameters, numbers that change the
shape of the distribution. The technical term for the procedure of finding the values of the
unknown parameters of a probability distribution from data is estimation. During
estimation one seeks to find parameters that make the model fit the data the best. If this
all sounds a bit subjective, thats because it is. In order to proceed, we have to provide some
kind of mathematical definition of what it means to fit data the best. The description of how
well the model fits the data is called the objective function. Typically, statisticians will try
to find estimators for parameters that maximize (or minimize) an objective function. And
statisticians will disagree about which estimators or objective functions are the best.
In the case of the Gaussian distribution, these parameters are called the mean and standard
deviation often written as and . In using the Gaussian distribution as a model for some
data, one seeks to find values of and that fit the data.
**** Our proposed distribution is bivariate exponential for that parameters are two scale
parameters 1 and 2 this distribution has a third parameter indicating the correlation which
may take as constant .
= (|) = (1|)(2 |) ( |) = ( |)
=1
Maximum likelihood estimation says: choose the parameters so the data is most probable
given the model or find that maximizes L. In practice, this equation can be very
complicated, and there are many analytic and numerical techniques to solve it.
My Manual calculation
Then the Bivariate probability and density functions become [34] for our formula
y x y
F ( x, y) (1 e x )(1 e )[1 e ]; x 0, y 0, 1 1 (4.22)
f x, y e
x y
x
y
1 2e 1 2e 1
(4.23)
By Farlie [38],
For manual calculation MLE of this cdf is very complicated so I take trial and error method
to calculated C value which is very important for process capability measure ........otherwise
we can calculate it by MLE method.......
In this case, there would be many choices of (C1, C2) corresponding to any specified P and .
An optimum choice would be attempted by considering the variance expression for Ib and
minimizing it with respect to variation in C1 and C2 subject to the corresponding P = C1 C2
{1+ (1- C1) (1- C2)}.
This is a quadratic equation. For a fixed probability P, process capability interval or natural
process interval (C) for negative values of is greater than that of corresponding positive
values. Choice of C for a given P and is unique though the value of C has to be obtained
only numerically by solving equation (4.29).
U1U 2 12
Ib U1U 2 R ( )12
log(1 C )
2
On numerical computation (by trial and error), we find that at fixed , if P increases, C
also increases. Obviously, of course, C depends upon P and .
P C
4 .Calculation of Cp Please give your suitable Software code or method with Values.....
U1=0.14, U2=0.5.
We assume that L=0 for both the data set.
Then from Table given by Mukherjee and Singh[80] for Least values of Var ( I )
Table 5.2: Least values of Var ( I )
n
K p1 p2 f Least Var ( I ) X
(U L)2
0.14 0.5
1.06
Ib 3.912 3.912 0.009374 0.033408
There are many popular statistical packages used by quality practitioners, like MATLAB
software, Minitab, Statistica, Retc. Minitab does have the provision of doing multivariate
normal capability analysis. Using that analysis, one can compare the results of multiple
variables. However, the analysis does not give the combined (where variables are dependent)
multivariate PCI. in the MATLAB,for multivariate analysis can be conducted by writing
specific object oriented code. We ues MATLAB software for our data analysis. it is cast-off
as an experimental and simulation software for the configuration of system established up &
for location up the data transmission among various nodes existing in the set-up. MATLAB is
an essential software design & commands are used as a replication device. For our second
case study MINITAB tool is used because its very user friendly and give good comparative
study.
for bivariate non-normal processes. For this simulation study, bivariate non-normal
distributions such as Gamma, Beta and Weibull and Weibull- Gamma are used.
P-values
The P-value of the test statistic, is the area of the sampling distribution from the sample result
in the direction of the alternative hypothesis. If the null hypothesis is correct, than the p-value
is the probability of obtaining a sample that yielded your statistic, or a statistic that provides
even stronger evidence of the null hypothesis.The p-value is therefore a measure of statistical
significance. If p-values are very small, there is strong statistical evidence in favor of the
alternative hypothesis. If p-values are large, there is insignificant statistical evidence. When
large, you fail to reject the null hypothesis.
Figure 5.7 presents flowchart of estimating PNC and PCIs using different methods and
different non-normal distributions. The exact PNC value (p) in this flow chart is obtained
using following equation.
(5.1)
(5.2)
(5.3)
(5.4)
where f(x) represents the probability density function of the process and T represents the
process mean for non-normal data and process median for non-
Three non-normal distributions; Gamma, Weibull and Beta have been used to generate
random data in this simulation. These distributions are used to investigate the effects of non-
normal data on the process capability index. These distributions are known to have the
parameter values that can represent mild to severe departures from normality. These
parameters are selected so that we can compare our simulation results with existing results
using the same parameters from the [12,19] . The probability density function of Gamma
distribution, with parameters and , is given by
(5.5)
The probability density function of Weibull distribution with shape ( ) and scale ( ) is
given by
(5.6)
The probability distribution function of Beta distribution with shape 1 ( ) and shape 2 ( ) is
given by
(5.7) (5.8)
where f (x) represents the corresponding distribution function of Gamma, Weibull and Beta
distributions.
In below Table We can find that the quantiles obtained from this proposed bivariate
distribution are similar to those of the simulated gamma distribution, normal distribution and
Weibull distribution are as:
Methods P*(Gamma) PNC(Gamma) Specification Limits
Cp CPk
[0,0] [0.0025,0.005]
In all above result the quantiles obtained from this distribution are similar to those of the
simulated gamma distribution. Figure 5.6 shows the scatter plot of data obtained by
generating observations from the fitted distribution. In the above case = 1.6637 and
= 0.57831, p-gamma=0.9771 and CPk=0.9771. The results of the simulation study for various
distributions in listed in Table 5.1. The results from the simulation study indicate that the
values of Cp and Pk obtained are close to the values obtained from the exact distribution. A
real data set is used from Wangs paper [58]. Wang discussed a manufacturing product
(called connector) from a computer industry having multivariate (seven) quality
characteristics. These seven characteristics are 1 (contact gap ), 2 (contact loop Tp), 3
(LLCR), 4 (contact Tp), 5 (contact loop diameter), 6 (LTGAPY) and 7 (RTGAPY),
respectively. The specification limits for these characteristics can be two-sided or one-sided,
and they are 0.100.04 , 0+0.50 , 115 , 0+0.2 , 0.550.06 , 0.070.05
and 0.07 0.05 , respectively. We selected the variables 1 and 2 for the study.
Ahmad [12] in his paper used approach to determine the from the above data. He
obtained as 0.001. Using -and- distribution, we obtained the as 0.0001, which is
close to that obtained by using the method.
**** This PNC 0.0001 is not for our thesis......Please give me suitable comparison.
By Owen and Li [] random sample of size n=20 from the Exponential data
0.029 0.483
0.046 0.528
0.133 0.606
0.194 0.789
0.265 0.940
0.287 1.681
0.322 1.766
0.433 2.014
0.441 3.088
0.464 3.279
2. Data fitting.
To determine what distribution data follows which is an important step. It can easily put data
into a software package that will test many different distributions to find out which
distribution fits into data best. But it should have a reason for using a certain distribution it
must make sense in terms of best process. It makes sense that it follows an exponential
distribution. As shown in Figure obtains from Minitab software for our data the best fit for
data is exponential distribution.
3. Parameter estimation
F(x;) = 1 ex
f ( x; ) e x
Parameter estimation by MLE method.
The first order condition for a maximum isThe derivative of the log-likelihood is
= n/Xi
Mean=0.8894=
If L=0 U =3,
(U L)
Then I e
[ ln(1 p1 c)] [ ln(1 p1 )]
=3/(3.912)(0.8894) =0.86
2. Comparative study:
U L
3. Process capability index based on Pearson system Cp
P0.99866 P0.00135
=0.82
(U L)
4. From Normal distribution Cp
6
Cp=0.73